^ Go OO 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 
Incemational Bureau 





PCX 

INTERNATIONAL APPUCATION PUBUSHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) Internatioiial Patent dassfflcation ^ ; 
H04B 15/00 



Al 



(11) Internatioiial Publication Number: WO 97/11538 

(43) International Publication Date: 27 March 1997 (27.03.97) 



(21) International Application Number: PCT/US96/ 1 4682 

(22) International Filing Date: 13 September 1996 (I3X)9.96) 



(30) Priority Data: 
08/529370 



18 September 1995 (18.09.95) US 



(71) Applicants: INTERVAL RESEARCH CORPORATION 
[US/US]; Building C. 1801 Page Mill Road, Palo Alto. 
CA 94304 (US). NGO. John-Thomas, Calderon [-AJS]: 
719 Liveipool Way. Sunnyvale. CA 94087 (US). BHAD- 
KAMKAR, Neal, Ashok [-/US]; 369 Matadero Avenue. 
Palo Alto. CA 94306 (US). 

(74) Agents: YIN, Ronald. L. et al.; Limbach & Lunbach Ll^P.. 
2001 Ferry Building. San Ftencisco. CA 9411 1 (US). 



(81) Designated States: AU, BR. CA. CN. JP» European patent 
(AT. BE, CH, DE. DK, ES, H. FR. GB. OR. IE. IT. LU, 
MC. NL. FT. SE). 



Published 

With international search report. 



(54) Title: AN ADAPTIVE FILTER FOR SIGNAL PROCESSING AND METHOD THEREFOR 




(57) Abstract 
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each of a plurality of transducers. This invention estimates the relative propagation delays among the transducers for each source. First, it 
randomly generates a fixed number of sets of delay parameters (220), called a population (230). Each set is processed by an instananeous- 
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least instantaneous perfomfiance value, it is incorporated into the population. Hie set with the least cumulative performance value is deleted 
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An Adaptive Filler for Signal Processing 
and Method Therefor 



5 This application is submitted with a microfiche appendix, containing copyrighted material. 

Copyright 1994, Interval Research Corporation. The Appendix consists of one (I) microfiche with forty- 
six (46) frames. The copyright owner has no objection to the facsimile reproduction by anyone of the 
patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or 
records, but otherwise reserves alt copyright rights whatsoever in the appendices. 

10 Technical Field 

This invention relates to the field of adaptive filter signal processing, and more particularly to an 
adaptive filter signal processor for use in a two stage microphone-array signal processor for separating 
sounds acoustically mixed in the presence of echoes and reverberations. More particularly, the present 
invention relates to an improved Direction of Arrival Estimator that estimates the optimal delay values 
IS for use in the direct signal separator, i.e., the first stage in the two-stage signal processor. 

Background of the invention 

It is well known that a human being can focus anemion on a single source of sound even in an 
environment that contains many such sources. This phenomenon is often called the "cocktail -party 
effect." 

20 Considerable effort has been devoted in the prior art to solve the cocktail-party effect, both in 

physical devices and in computational simulations of such devices. In a co-pending application we have 
disclosed a two stage microphone-array signal processor in which a first stage partially accomplishes the 
intended aim using information about the physical directions from which signals are arriving. However, 
that application discloses the use of a conventional direction of arrival estimator to estimate the various 

2S delays in the acoustic waves, used in the first stage to produce the signals. This application discloses an 
improved direction-of-arrival estimator. 

Estimation of signal parameters is pivotal in such a processor as well as a variety of signal- 
processing applications, such as source separation and source localization. Examples of signal parameters 
are the differences in propagation delays from a given source to the various sensors, and the direction 

30 from which a given source signal is arriving. 

One prior art solution to the problem of signal-parameter estimation is to use directional sensors: 
either a collection of such sensors can be mounted so as to tessellate the set of directions of interest, or a 
single such sensor can be used to scan the possible directions. In either case, the strategy is to identify 
directions from which maximum signal power is detected. 

35 Another prior art is to scan with a directional sensor employing classical beamforming. Signals 

from a collection of sensors at different spatial locations are averaged together, after being subjected to 
relative delays and subsequent addition such that waves arriving from a particular "look direction*' are 
enhanced relative to waves arriving from other directions. Thus, the beamformer behaves as a directional 
sensor whose direction of maximum sensitivity can be changed without physically reconfiguring the 

40 sensors. Flanagan et al. (1985) described a system in which speakers in an auditorium are located by a 
beamformer that continuously scans the auditorium floor. 

There are at least two fundamental shortcomings of this approach. First, the rate at which a new 
source can be located and tracked is limited by the rate at which the space of possible directions can be 
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swept; that rate depends, in turn, on the length of time required to obtain a reasonably accurate estimate 
of the signal power being received from a particular direction. Second, resolution is fundamentally 
limited since the signal averaging done by a classical beamformer cannot provide relative attenuation of 
frequency components whoic wavelengths are larger than the array. For example, a microphone array 
5 one fool wide cannot attenuate frequency components in a sound that are below approximately I kHz. 

Other prior art to increase the speed of source localization is achieved by a strategy that involves, 
in essence, shifting signals from two or more sensors in time to achieve a best match. A simple version 
of this principle is applied in a commercial device called the Direction Finding Blast Gauge (DFBG>, 
developed by G. Edwards of GD Associates (1994). Designed to determine the direction from which 

10 short explosive sounds arrive, the DFBG records the times at which a shock wave traverses four 

symmetrically placed pressure sensors. Direction of arrival is calculated from these times in conventional 
manner. The DFBG is designed for short explosive sounds that are relatively isolated. 

The same principle used in the DFBG has been applied in a more refined way to other types of 
sound. Coker and Fischell (1986) patented a system that localizes signals from speech, which consists 

15 approximately of a train of energy bursts. Sugic et al. (1988), Kaneda (1993), and Bhadkamkar and 

Fowler (1993) implemented systems that exploit the tendency of naturally occurring sounds to have fairly 
well-defined onsets: associated with each microphone is a detector that produces a spike whenever an 
onset occurs. All of the systems mentioned in this paragraph operate by fmding inter-microphone time 
delays that maximize the rate at which coincident spikes are delected. They compensate for noise in the 

20 individual time-delay estimates by generating joint estimates from multiple measurements. When many 
sources of sound are present, these statistical estimates are derived by histogram methods. Thus, the 
accuracy of the method depends on an assumption that over the time scale on which the individual time- 
delay estimates are made, the spike trains come predominantly from one source. 

In a similar vein, noisy time-delay estimates can be obtained from narrowband signals by 

25 comparing their complex phases, provided that the magnitudes of the delays are known in advance not to 
exceed the period of the center frequency. Joint estimates can be produced, as above, by histogram 
methods. The front end of a system described by Morita (1991) employs a scheme of this type. 

The approaches to sound localization employed by Edwards (1994), by Coker and Fischell (1986), 
by Sugie et al. (1988), by Kaneda (1993), by Bhadkamkar and Fowler (1993) and by Morita (1991) 

30 accommodate multiple so es by exploiting an assumption that in most time intervals over which 
individual time-delay estimates are made, signal from only one source is present. 

The techniques to which we now turn our anention, like the present invention, differ from these in 
two ways. First, they do not rely on the assumption that at most one source is present per measurement 
interval. Second, they are designed for a more general domain of signal-parameter estimation, in which 

35 time-delay estimation is only one example. 

The so-called "signal-subspace" techniques, as exemplified by ESPRIT (Roy et al., 1988) and its 
predecessor. MUSIC (Schmidt, 1981), are specific for narrowband signals. The collection of n sensor 
signals is treated as a 2n-dimensional vector that varies in time. From prior knowledge of the array 
configuration, it is known what dimensions of this 2n-dimensional signal space would be accessible to the 

40 signal vector arising from a single source, as a function of the desired signal parameters. Moreover, by 
diagonal izing the covariance matrix of the observed signal vector and by inspecting the resulting 
eigenvalues and eigenvectors, it is known what dimensions of the signal space are actually spanned by the 
signal vector. Estimates for the desired signal parameters are extracted by comparing these two pieces of 
information. This prior an requires the source signals not be fully correlated; otherwise, the observed 

45 covariance matrix contains insufftctent information to extract the signal parameters. 
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The shortcomings of this prior art relative to the present invention are (a) that it is computationally 
intensive; (b) that the sensor-array configuration must be partially known; and (c) that the number of 
sensors far exceed the number of sources (otherwise, the signal vector can span all available dimensions). 
Another prior art is based on an assumption that source signals are statistically independent. The 
5 condition of statistical independence is much more stringent than that of decorrelation: whereas two 

signals x(t) and y(i) are decorrelated if E{[x{t)y(t)]} - E{x(t)}E{y(t)}, they are statistically independent 
only if E{flx(l)]g[y(t)]} = E{fIx(t)]}E{g[y{t)]} for any scalar functions f and g. (The notation £{•} 
denotes an estimate of the expected value of the enclosed expression, as computed by a time average.) In 
return for requiring the more stringent conditions on the nature of the source signals, statistical- 
10 independence techniques do not always require prior knowledge of the array configuration. (For 

example, if the signal parameters to be estimated are the relative propagation delays from the sources to 
the sensors, nothing need be known about the array geometry; however, inter-sensor distances are 
obviously needed if the relative time delays are to be translated into physical directions of arrival.) 

Moreover, these techniques accomplish the more difficult goal of source separation, i.e., recovery 
15 of the original source signals from the observed mixtures: signal -parameter estimation is normally a by- 
product. 

Like signal-subspace techniques, some statistical- independence techniques begin by diagonatizing 
the covariance matrix because doing so decorrelates the components of the signal vector. However, they 
are distinct because they use the results of diagonal ization in different ways. Critical to understanding 

20 the distinction is that neither parameter estimation nor source separation can be achieved merely by 

diagonalizing the covariance matrix (Cardoso, 1989; Lacoume and Ruiz, 1992; Moulines and Cardoso, 
1992). Whereas the signaUsubspace techniques fill in the missing information by employing knowledge 
of the sensor-array configuration, the statistical-independence techniques do so by applying additional 
conditions based on the sensor-signal statistics alone. 

25 Statistical-independence techniques can be summarized as follows. The source signals are treated 

as a signal vector, as are the sensor signals. The processes by which the source signals propagate to the i 
sensors are modeled as a transformation H from the source signal vector to the sensor signal vector. This, 
transformation can consist of a combination of additions, multiplications, time delays, and convolutive 
filters. If the transformation H were known, then its inverse (or pseudoinverse) could be computed, and 

30 applied to the sensor signal vector to recover the original sources. However, since the source locations 

relative to the sensor array are not known, the inverse transformation must be estimated. This is done by 
fmding a transformation S that, when applied to the sensor signal vector, produces an output signal vector 
whose components are statistically independent. Statistical-independence techniques are relevant in the 
present context inasmuch as signal parameters can often be estimated from S. 

35 Most existing statistical-independence techniques that are adaptive, i.e., able to alter S in response 

to changes in the mixing transformation H, operates by maintaining a single estimate of S. As new data 
from the signal stream become available, they are incorporated by updating the estimate of S. These are 
referred to as •*single-estimate" techniques because memory of past data is contained, ai any given instant, 
in a single estimate of S. 

40 The chief goal of this invention is to remove a dilemma, common to single-estimate techniques, 

that arises from the presence of two opposing factors in the selection of time constants for adaptation. 
On one hand, the mixing transfomation H can undergo large, abrupt, sporadic changes; the abruptness of 
these changes calls for short time constants so that the system can adapt quickly. On the other hand, the 
source signals can be such that they are statistically independent only over relatively long time scales: 

45 when averaging is performed over shorter time scales, statistical error and coincidental correlation imply 
deviations from true statistical independence. To obtain accurate estimates of S from such data, it would 
seem necessary to employ long time scales. 
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Previous adaptive techniques for parameter estimation based on statistical independence are now 
reviewed, in each case it is shown why the given technique can be considered a single-estimate 
procedure. 

In one class of single-estimate techniques, the following steps are executed periodically. Joint 
5 statistics of the signal vector are averaged over a set of times defined relative to the current time, with 
inverse time constant D. A value of S, called S\ is chosen such that it enforces certain conditions of 
statistical independence exactly over the averaging time. The estimate of S used to generate the separated 
outputs of the system may be S* itself, but more commonly it is a smoothed version of S\ with 
smoothing parameter p.. A well-known technique in this class is Fourth Order Blind Identification 

10 (FOBl), in which S' is computed by simultaneously diagonalizing the covariance and quadricovariance 
matrices, thus satisfying conditions of the form E{x(t)y(t)}=0 and E{x^(t)y(t)+x(t)y^(t))=0 (Cardoso, 
1989; Lacoume and Ruiz, 1992). Since x(t) and y(t) arc assumed to have zero mean, the two equations 
are necessary conditions for statistical independence. 

Another class of techniques, whose genesis is attributed to Herault and Jutten (1986), is closely 

15 related to the LMS algorithm (Widrow, 1970). S is taken to be a matrix of simple gains; thus, it is 
assumed that each source signal arrives at every microphone simultaneously. The quantity AS is 
computed at every time step, based on the instantaneous value of the output vector. y=Sx: 

AS = flyjCt)] g[yj(t)], 
where f and g are different odd nonlinear functions. 

20 The estimate S is updated at every time step, according to the rule S(new) = S(old) + ^ AS. We 

will refer to the use of rules of this type as "weight-update" techniques. This technique works because 
when the components of the output-signal vector x are statistically independent, the statistical relation 
E{nxi(t)]g[Xj(t)]}=E{flXj{t)]}E{g(xj(t)]} holds for any different i and j. Since the signals have zero 
mean and f and g are odd, E{flxj(t)]g[xj(t)]}=0. Consequently. E{AS}=0; thus, the transformation S 

25 fluctuates about a fixed point. (If f and g were both linear, then the Herault-Jutten technique would only 
decorrelate the outputs; and decorrelation is insufficient for statistical independence.) 

A host of other weight-update techniques have been derived from Herault and Jutten (1986). Some 
variants employ a different criterion for separation in which yj(i) is decorrelated with yj(t-Tp and with 
>j(^'^2^* ^h**"^ ^1 and T2 are different time delays, determined in advance (Tong et aL, 1991; Abed- 

30 Meraim el al., 1994; Van Gerven and Van Compemolle, 1994; Molgedey and Schuster. 1994). Other 
variants of Herault-Junen employ a transformation S whose elements are time delays (Piatt and Faggin, 
1992; Bar-Ness, 1993) or convolutive filters (Jutten et al., 1992; Van Compemolle and Van Gerven, 
1992). By going beyond the simple gains employed by Herault and Jutten, these techniques come closer 
to separating signals that have been mixed in the presence of propagation delay and echoes. Certain 

35 difficulties in adaptation associated with the use of convolutive filters are addressed by a recent class of 
two-stage structures (Din? and Bar-Ness, 1993; Najar et aL, 1994) in which a second weight-update stage 
is placed before or after the first, to improve the quality of separation anained after convergence. 

Single-estimate techniques for adaptive source separation all suffer to some extent from the 
dilemma described above: that the rate of adaptation cannot be increased without simultaneously 

40 increasing the size of fluctuations present even after convergence is reached. In some practical 

applications, there exist settings of ^ for which both the rate of adaptation and the size of the fluctuations 
are acceptable; in others, there is a pressing need for techniques in which adaptation and fluctuation are 
not intrinsically co-dependent. 

To accomplish this end, the present invention is founded in a break from the tradition of 

45 maintaining a single estimate of S and incorporating new data by changing the value of S. Rather, 

multiple estimates of S, called hypotheses, are mainuined. Associated with each hypothesis is a number 
called the cumulative performance value, in contrast with single-estimate techniques, which incorporate 
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new data by perturbing the estimate of S, the present technique incorporates new data by perturbing the 
cumulative performance values. Thus, each hypothesis remains unchanged from the time it is created to 
the time that it is destroyed. At any given instant, the hypothesis of greatest cumulative performance 
value is taken to be the correct one. Optionally, smoothing may be employed to avoid discontinuous 
changes in S; however, this smoothing is not an intrinsic part of the adaptive process. 

Until this point we have discussed related art for achieving sound separation, which is one 
particular application of adaptive filtering. However, the present invention is contemplated to have 
potential uses in other applications of adaptive filtering. We now therefore turn our attention to related 
art in the broader field of adaptive filtering. Compared to any given variant of the standard techniques, 
the present invention is superior in at least one of the following ways: 

• There is essentially no restriction on the form of the performance function, although from a 
practical standpoint certain performance functions can be computationally intensive to 
handle. 

• Once convergence is achieved, it does not necessarily exhibit fluctuations in proportion to 
the rate of adaptation. 

• It does not necessarily fail when the perfonnance function is underdetermined. 

These attractive properties of the present invention will be explained in the context of the standard 
techniques. 

We first describe stochastic-gradient (SG) techniques and explain the shortcoming of SG 
techniques that the present invention is intended to avoid. We describe how least squares (LS) techniques 
overcome this limitation, and state why LS techniques cannot always be ixsed. We identify a source of 
difficulty in adaptive filtering that we call instantaneous underdetermi nation, and explain why the present 
invention is superior to SG and LS with respect to its handling of instantaneous underdetermination. 

We then turn discuss a less standard, but published technique based on the genetic algorithm (GA). 
We state why the GA might be considered prior art, then differentiate the present invention from it. 

Problems in adaptive filtering can generally be described as follows. One or more time-varying input 
signals Xj(t) are fed into a filter structure H with adjustable coefficients hj,, i.e., H=H(h|,h2,...)- Output 
signals yi(t) emerge from the filter structure. For example, an adaptive filter with one input x(t) and one 
output y(t) in which H is a linear, causal FIR might take the form: 

y(t)- E h^xd-kAt) . 
k-0 ^ 

In general, H might be a much more complicated filter structure with feedback, cascades, parallel 
elements, nonlinear operations, and so forth. 

The object is to adjust the unknowns hj^ such that a performance function C(t) is maximized over time. 
The performance function is usually a function of statistics computed over recent values of the outputs 
yj(t); but in general, it may also depend on the input signals, the filter coefficients, or other quantities 
such as parameter settings supplied exogenous! y by a human user. 

The most basic SG technique. Least Mean Square (LMS), can be used for instances of this problem in 
which the gradient of the performance function C(t) with respect to the filter coefficients hj(t) can be 
expressed as a time average of some quantities Gj(t). The Herault-Jutten algorithm for sound separation, 
described previously, may be regarded as an example of LMS. 

For example, consider the transmission-line echo-cancellation problem typical in the design of 
full-duplex modems: 
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y(t) = x,(t) + Z hi^X2(i-kAt) 
C(t) = E(y2(t)]. 

Differentiating C with respect to the filter coefficients hj, we have: 
5 (a/ahj) C = E[2 y(t) X2(t-iAt)) - ElG^l 

where 

GjCt) = 2 y(t) X2(um . 

The LMS algorithm calls for updating the coefficients hj as follows: 

hj(new) = hj(old) + ^ Gj(t), 

10 where )i is a stepsize that is chosen in advance, it may be seen that when the coefficients h- are near a 

local maximum of C, i.e., the gradient of C is zero and the Hessian matrix of C is negative definite, then 
E[Gj] = 0 and the coefficients fluctuate about that maximum. 

Thus, unlike the present invention, SO techniques require that the performance function be 
differentiable, and that the gradient be numerically easy to compute. 

15 In addition. SO techniques suffer from a key problem: for any nonzero stepsize ^, the coefficients 

fluctuate. Stated more generally, two important performance criteria - the rate of adaptation and the 
steady-state error in the outputs - are at odds with each other in the setting of ^. Numerous solutions to 
this dilemma have been proposed: 

• In block-based variants of LMS. updates to the filter coefficients are accumulated over 

20 blocks of time and added to the coefficients at the end of each block, instead of being added 

at every time step. 

• In adaptive-stepsize LMS, the quantity \i is adjusted according to a predefined schedule 
("gearshifting") or is itself adapted as a parameter of the system using an algorithm similar 
to LMS ("gradient adaptive step size*'). 

25 These incremental modifications to the LMS algorithm mitigate, but do not eliminate the problem 

of fluctuation. This problem cannot be eliminated fully without abandoning the principle of operation 
behind LMS and its variants: that only the time-averaged fluctuation E[^Gj] (not the individual update, 
^iGj) is zero when the filter is fully adapted. 

The stochastic fluctuations just described are absent in the so-called Least Squares (LS) techniques. 

30 of which Recursive Least Squares (RLS) is the archetype. The operations performed by RLS are 

mathematically equivalent to maximize C(t) exactly at every time step; thus, in RLS. the computed filter 
coefficients converge asymptotically to their correct values as fast as the statistics estimated in the 
computation of C(t) converge. 

Maximizing C(t) exactly at every step would be computationally prohibitive if done by brute force 

35 at every time step, because C(t) is a functional that depends on integrals of the data from the beginning 
of time. In RLS, a practical level of efficiency is achieved by a mathematical trick that permits the exact 
maximum of C(t) for the current time step to be computed from the exact maximum computed at a 
previous time step and the intervening data. 
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An LS technique is often the method of choice when a problem can be formulated appropriately. 
However, because of reliance on "mathematical tricks" to maximize C(t) exactly and efficiently at every 
time step, LS techniques have been developed only for certain restricted classes of adaptive-filtering 
problems: 

5 • H is a FIR filter and 

C(t)= ^ a(n)b[y(t-nAt)-d(t-nAt)] 
k=0 

where d(t) is a desired signal, a(*) is one of a handful of averaging fiinctionals that can be computed 
10 recursively, and b[»] is typically 

• H is a lattice filter and C(t) is as above; 

• H is a filter with a systolic architecture and C(t) is as above; 

• H is an IIR filter and C(t) is in "equation -error" form. 

In contrast, the present invention requires neither a closed-form solution for the maximum of C(t) 
15 nor an efHcient update scheme for computing the current maximum of C(t) from a past maximum and 
the intervening data. Thus, the present invention may be considered an alternative to LS and the SG 
methods for situations in which the latter do not work well enough and the former do not exist. 

In addition to the comparisons made so far, differences between LS, SG, and this invention arise in 
the handling of instantaneous underdetermination. For the present purposes, we use the term 
20 "instantaneously underdetermined" to describe a performance function for which global maximization at 
any particular instant does not necessarily produce the desired filter coefflcients-the "correct" filter 
coefficients might not be the only ones that produce a maximal or numerically near-maximal value of the 
performance function. Such degeneracy can occur when the locus of global maximal of the performance 
function is delocalized, or when the basin of attraction is of sufficiently low curvature in one or more 
25 dimensions that noise and numerical imprecision can cause the apparent maximum to move far from the 
ideal location. When a performance function is instantaneously underdetermined, snapshots of the 
performance surface from many different instants in time may together contain enough information to 
determine a solution. 

LS methods operate by attempting to find the maximum of C(t) at every instant in time, effectively 
30 by solving a system of linear equations. Underdetermination is manifested as ill-conditioning or rank 
deficiency of the coefficient matrix. The solution returned by such an ill-conditioned system of linear 
equations is very sensitive to noise. 

The behavior of SG methods and of the present invention with respect to instantaneous 
underdetermination is more difficult to analyze since both systems involve the accumulation of small 
35 changes over time. To simplify the comparison, note that the present invention can use a standard SG 
technique or any other method for adaptive filtering as a source of guesses at the correct filter 
coefficients. Thus, such a combination can always find the correct (or nearly correct) filter coefficients if 
the SG technique alone can find them. The two methods differ in what happens after such coefficients 
are found. In the case of SG, fiuciuaiions can eventually cause the coefficients hj{t) to drive far from 
40 their correct values because with instantaneous underdetermination, the update vectors mu Gj(t) can point 
away from the correct values (Cohen, 1991). In the case combining SG with the present invention, if a 
set of correct filter coefficients ever appears in the population of candidate parameter sets, the invention 
as a whole will use those coefficients without fluctuation. 

We turn our attention in the remainder of this section to published work in adaptive filtering based 
45 on the genetic algorithm (GA): 

R. Nambiar, C.K.K. Tang, and P. Mars, "Genetic and Learning Automata Algorithms for Adaptive 
Digital Filters," ICASSP '92, 1V:4|.44 (1992). 
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S.J. Flockton and M.S. White, "Pole-Zero System Identification Using Genetic Algorithms." Fifth 
International Conference on Genetic Algorithms, 531-535 (1993). 

S.C. Ng. C-Y. Chung, S.H. Leung, and Andrew Luk, "Fast Convergent Search for Adaptive IIR 
Filtering," ICASSP '94, 111:1 05-108 (1994), 
5 These applications of the GA to adaptive filtering are similar in form to the present invention 

because the GA employs a population of candidate solutions. 

The works of Calloway (1991), Etter and Dayton (1983), Etter et al. (1982), Eisenhammer et al. 
(1993). Flockton and White (1993), Michielssen el al. (1992), Suckley (1992). and Xu and Daley (1992) 
are the easiest to eliminate from consideration as prior art because the problems under consideration did 
10 not require adaptation: the object was to adjust unknown fiher coefficients such that the frozen 

performance function C(t) is maximized at a single instant in time. Thus the design of these GAs does 
not take into account any of the difficulties associated with maximizing a time-varying performance 
function consistently over time. 

The remaining known applications of the GA to signal processing are examples of adaptive filtering in 
15 the sense that they do accommodate time variation of the performance function. These are Nambiar et al. 
(1992), Nambiar and Mars (1992), Ng et al. (1994). Patiakin (1993). and Steams el al. (1982). In every 
case, however, the only way that time variation is taken into account is via what might be called the 
"headstart" principle. Rather than attempt to find the optimum of C(t) independently for every time t, 
these methods use the population of solutions from a time in the recent past as an initial guess at the 
20 solution for the current time. In some cases, rounds of gradient descent are interleaved with short runs of 
the GA, but the principle is the same. All of these GAs operate with the goal of finding the global 
optimum of C(t) for every time t. 

. These GAs do not solve the fundamental problem that a snapshot of the performance function at any 
given time contains noise. When the performance function fluctuates, so do the optimized filter 

25 parameters- One way to make each snapshot more reliable, and therefore make the optimized filter 
parameters fluctuate less, is to lengthen the time windows of the averages used to compute the 
performance function. Another way is to smooth the optimized filter parameters in time. Both 
approaches make the system slow to react to changes. Moreover, the fluctuations and sensitivity to noise 
mentioned in this paragraph are exacerbated if the performance function is underdetermined. 

30 It is desirable to develop a method for adaptive filtering that applies even when the performance 

function cannot be maximized analytically or differentiated, that does not exhibit fluctuations in 
proportion to the rate of adaptation, and that copes well with noise and underdetermination. 

Summary of The Invention 

The present invention is a method and an apparatus for determining a set of a plurality of 
35 parameter values for use in an adaptive filter signal processor to process a plurality of signals to generate 
a plurality of processed signals. 

The method stores a plurality of the sets, called a population. The population is initially generated 
at random. Each set is evaluated according to an instantaneous performance value. The instantaneous 
performance value of each set is added, at each clock cycle, to a cumulative performance value. At any 
40 given time, the set with the greatest cumulative performance value is used in the adaptive filter signal 
processor. 

Periodically, a new parameter set is generated at random for possible incorporation into the 
population. If the instantaneous performance value of the new parameter set exceeds the least 
instantaneous performance value in the population, it is incorporated into the population. The set with 
45 the least cumulative performance value is deleted fi^m the population, and the new parameter set is 
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assigned an initial cumulative performance value that falls short of the greatest cumulative performance 
value by a fixed amount. 

Brief Description of The Drawings 

Figure I is a schematic block diagram of an embodiment of an acoustic signal processor, using two 
5 microphones, with which the adaptive filter signal processor of the present invention may be used. 

Figure 2 is a schematic block diagram of an embodiment of the direct-signal separator portion, i.e.. 
the first stage of the processor shown in Figure 1 . 

Figure 3 is a schematic block diagram of an embodiment of the crosstalk remover portion, i.e., the 
second stage of the processor shown in Figure 1 . 
10 Figure 4 is a schematic block diagram of an embodiment of a direct-signal separator suitable for 

use with an acoustic signal processor employing three microphones. 

Figure 5 is a schematic block diagram of an embodiment of a crosstalk remover suitable for use 
with an acoustic signal processor employing three microphones. 

Figure 6 is an overview of the delay in the acoustic waves arriving at the direct signal separator 
15 portion of the signal processor of Figure 1, and showing the separation of the signals. 

Figure 7 is an overview of a portion of the crosstalk remover of the signal processor of Figure I 
showing the removal of the crosstalk from one of the signal channels. 

Figure 8 is a schematic block level diagram of an embodiment of the DOA estimator of the present 
invention for use in the acoustic signal processor of Figure I. 

20 Detailed Description of The Invention 

The present invention is an embodiment of a direction of arrival estimator portion of an adaptive 
filter signal processor, and the adaptive filter signal processor, which may be used in a device that 
mimics the cocktail-party effect using a plurality of microphones with as many output audio channels, 
and a signal-processing module. When situated in a complicated acoustic environment that contains 

25 multiple audio sources with arbitrary spectral characteristics, the processor as a whole supplies output 
audio signals, each of which contains sound from at most one of the original sources. These separated 
audio signals can be used in a variety of applications, such as hearing aids or voice-activated devices. 

Figure 1 is a schematic diagram of a signal separator processor of one embodiment of the present 
invention. As previously discussed, the signal separator processor of the present invention can be used 

30 with any number of microphones. In the embodiment shown in Figure I, the signal separator processor 
receives signals from a first microphone 10 and a second microphone 12, spaced apart by about two 
centimeters. As used herein, the microphones 10 and 12 include transducers (not shown), their associated 
pre-amplifiers (not shown), and A/D converters 22 and 24 (shown in Figure 2). 

The microphones 10 and 12 in the preferred embodiment are omnidirectional microphones, each of 

35 which is capable of receiving acoustic wave signals from the environment and for generating a first and a 
second acoustic electrical signal 14 and 16 respectively. The microphones 10 and 12 are either selected 
or calibrated to have matching sensitivity. The use of matched omnidirectional microphones 10 and 12, 
instead of directional or other microphones leads to simplicity in the direct-signal separator 30, described 
below. In the preferred embodiment, two Knowles EM-3046 omnidirectional microphones were used, 

40 with a separation of 2 centimeters. The pair was mounted at least 25 centimeters from any large surface 
in order to preserve the omnidirectional nature of the microphones. Matching was achieved by connecting 
the two microphone outputs to a stereo microphone preamplifier and adjusting the individual channel 
gains so that the preamplifier outputs were closely matched. The preamplifier outputs were each digitally 
sampled at 22,050 samples per second, simultaneously. These sampled electrical signals 14 and 16 are 

45 supplied to the direct signal separator 30 and to a Direction of Arrival (DOA) estimator 20. 
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The direct-signal separator 30 employs information from a DOA estimator 20, which derives its 
estimate from the microphone signals. In a different embodiment of the invention, DOA information 
could come from an source other than the microphone signals, such as direct input from a user via an 
input device. 

5 The direct signal separator 30 generates a plurality of output signals 40 and 42. The direct signal 

separator 30 generates as many output signals 40 and 42 as there are microphones 10 and 12, generating 
as many input signals 14 and 16 as are supplied to the direct signal separator 30. Assuming that there 
are two sources, A and B, generating acoustic wave signals in the environment in which the signal 
processor 8 is located, then each of the microphones )0 and 12 would detect acoustic waves from both 

10 sources. Hence, each of the electrical signals 14 and 16, generated by the microphones 10 and 12. 
respectively, contains components of sound from sources A and B. 

The direct-signal separator 30 processes the signals 14 and 16 to generate the signals 40 and 42 
respectively, such that in anechoic conditions (i.e., the absence of echoes and reverberations), each of the 
signals 40 and 42 would be of an electrical signal representation of sound from only one source. In the 

IS absence of echoes and reverberations, the electrical signal 40 would be of sound only from source A, 
with electrical signal 42 being of sound only from source B, or vice versa. Thus, under anechoic 
conditions the direct-signal separator 30 can bring about full separation of the sounds represented in 
signals 14 and 16. However, when echoes and reverberation are present, the separation is only partial. 

The output signals 40 and 42 of the direct signal separator 30 are supplied to the crosstalk remover 

20 50. The crosstalk remover 50 removes the crosstalk between the signals 40 and 42 to bring about fully 
separated signals 60 and 62 respectively. Thus, the direct-signal separator 30 and the crosstalk remover 
50 play complementary roles in the system 8. The direct-signal separator 30 is able to bring about full 
separation of signals mixed in the absence of echoes and reverberation, but produces only partial 
separation when echoes and reverberation are present. The crosstalk remover 50 when used alone is 

25 often able to bring about full separation of sources mixed in the presence of echoes and reverberation, but 
is most effective when given inputs 40 and 42 that are panially separated. 

After some adaptation time, each output 60 and 62 of the crosstalk remover 50 contains the signal 
from only one sound source: A or B. Optionally, these outputs 60 and 62 can be connected individually 
to post filters 70 and 72, respectively, to remove known frequency coloration produced by the direct 

30 signal separator 30 or the crosstalk remover 50. Practitioners skilled in the art will recognize that there 
are many ways to remove this known frequency coloration: these vary in terms of their cost and 
effectiveness. An inexpensive post filtering method, for example, consists of reducing the treble and 
boosting the base. 

The filters 70 and 72 generate output signals 80 and 82, respectively, which can be used in a 
35 variety of applications. For example, they may be connected to a switch box and then to a hearing aid. 

Referring to Figure 2 there is shown one embodiment of the direct signal separator 30 portion of 
the signal processor 8 of the present invention. The microphone transducers generate input signals 1 1 
and 13, which are sampled and digitized, by clocked sample-and-hold circuits followed by analog-to- 
digitat conveners 22 and 24. respectively, to produce sampled digital signals 14 and 16 respectively. 
40 The digital signal 14 is supplied to a first delay line 32. In the preferred embodiment, the delay 

line 32 delays the digitally sampled signal 14 by a non-integral multiple of the sampling interval T. 
which was 45.35 microseconds given the sampling rate of 22,050 samples per second. The integral 
portion of the delay was implemented using a digital delay line, while the remaining subsample delay of 
less than one sample interval was implemented using a non-causal, truncated sine filter with 41 
45 coefficients. Specifically, to implement a subsample delay of t, given that t<T. the foHowing filter 

is used : 
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y{n) = Z w(k) x(n-k) 
k=-20 

where x(n) is the signal to be delayed, y(n) is the delayed signal, and w(k) {k=-20.-l9,...193) are the 
41 filter coefficients. The filter coefficients are determined from the subsample delay t as follows: 

w(k) = (l/S) sinc{ n UtTT) - k] ) 
where 

sinc(a) = sin{a) / a if a not equal to 0 
= 1 otherwise. 



10 and S is a normalization factor given by 
sinc{ 7t [(t/T) - k] } 



k=-20 

The output of the first delay line 32 is supplied to the negative input of a second combiner 38. 

15 The first digital signal 14 is also supplied to the positive input of a first combiner 36. Similarly, for the 
other channel, the second digital signal 16 is supplied to a second delay line 34, which generates a signal 
which is supplied to the negative input of the first combiner 36. 

In the preferred embodiment, the sample-and-hold and A/D operations were implemented by the 
audio input circuits of a Silicon Graphics Indy workstation, and the delay lines and combiners were 

20 implemented in software running on the same machine. 

However, other delay lines such as analog delay lines, surface acoustic wave delays, digital low- 
pass filters, or digital delay lines with higher sampling rates, may be used in place of the digital delay 
line 32, and 34, Similarly, other combiners, such as analog voltage subtractors using operational 
amplifiers, or special purpose digital hardware, may be used in place of the combiners 36 and 38. 

25 Schematically, the function of the direct signal separator 30 may be seen by referring to Figure 6. 

Assuming that there are no echoes or reverberations, the acoustic wave signal received by the microphone 
12 is the sum of source B and a delayed copy of source A. (For clarity in presentation here and in the 
forthcoming theory section, time relationship between the sources A and B and the microphones 10 and 
12 are described as if the electrical signal 14 generated by the microphone 10 were simultaneous with 

30 source A and the electrical signal 16 generated by the microphone 12 were simultaneous with source B. 
This determines the two-arbitrary additive time constants that one is free to choose in each channel.) 
Thus, the electrical signal 16 generated by the microphone 12 is an electrical representation of the sound 
source B plus a delayed copy of source A. Similarly, the electrical signal 14 generated by the 
microphone 10 is an electrical representation of the sound source A and a delayed copy of sound source 

35 B. By delaying the electrical signal 14 an appropriate amount, the electrical signal supplied to the 

negative input of the combiner 38 would represent a delayed copy of source A plus a further delayed 
copy of source B. The subtraction of the signal from the delay line 32 and digital signal 16 would 
remove the signal component representing the delayed copy of sound source A, leaving only the pure 
sound B (along with the funher delayed copy of B). 

40 The amount of delay to be set for each of the digiul delay lines 32 and 34 can be supplied from 

the DOA estimator 20. Numerous methods for estimating the relative time delays have been described in 
the prior art (for example, Schmidt, 1981; Roy ct al., 1988; Morita, 1991; Allen, 1991). Thus, the DOA 
estimator 20 is well known in the art. 

In a different embodiment, omni-dlrectional microphones 10 and 12 could be replaced by 

45 directional microphones placed veo' close together. Then all delays would be replaced by multipliers; in 
panicular, digital delay lines 32 and 34 would be replaced by multipliers. Each multiplier would receive 
the signal from its respective A/D converter and generate a scaled signal, which can be either positive or 
negative, in response. 
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A preferred embodiment of the crosstalk remover 50 is shown in greater detail in Figure 3. The 
crosstalk remover 50 comprises a third combiner 56 for receiving the first output signal 40 from the 
direct signal separator 30. The third combiner 56 also receives, at its negative input, the output of a 
second adaptive filter 54. The output of the third combiner 56 is supplied to a first adaptive filter 52. 
5 The output of the first adaptive filter 52 is supplied to the negative input of the fourth combiner 58, to 

which the second output signal 42 from the direct signal separator 30 is also supplied. The outputs of the 
third and fourth combiners 56 and 58 respectively, are the output signals 60 and 62, respectively of the 
crosstalk remover 50. Schematically, the function of the crosstalk remover 50 may be seen by 
referring to Figure 7. The inputs 40 and 42 to the crosstalk remover 50 are the outputs of the 

10 direct-signal separator 30. Let us assume that the direct-signal separator 30 has become fully adapted, 
i.e., (a) that the electrical signal 40 represents the acoustic wave signals of source B and its echoes and 
reverberation, plus echoes and reverberation of source A, and similarly (b) that the electrical signal 42 
represents the acoustic wave signals of source A and its echoes and reverberation, plus echoes and 
reverberation of source B. Because the crosstalk remover 50 is a feedback network, it is easiest to 

15 analyze subject to the assumption that adaptive filters 52 and 54 are fully adapted, so that the electrical 

signals 62 and 60 already correspond to colored versions of B and A, respectively. The processing of the 
electrical signal 60 by the adaptive fitter 52 will generate an electrical signal equal to the echoes and 
reverberation of source B present in the electrical signal 42: hence subtraction of the output of adaptive 
filter 52 from the electrical signal 42 leaves output signal 62 with signal components only from source A. 

20 Similarly, the processing of the electrical signal 62 by the adaptive filter 54 will generate an electrical 
signal equal to the echoes and reverberation of source A present in the electrical signal 40; hence 
subtraction of the output of adaptive filter 54 from the electrical signal 40 leaves output signal 60 with 
signal components only from source B. 

Theory 

25 It is assumed, solely for the purix>se of designing the direct-signal separator 30, that the 

microphones 10 and 12 are omnidirectional and matched in sensitivity. 

Under anechoic conditions, the signals X|(t) and x^Ct). which correspond to the input signals, 
received by microphones 10 and 12, respectively, may be modeled as 

XiO) = Wj(t) H- W2(t-T2) 
X2(t) = W2(t) + wj(t-i,), 

where w^(t) and W2(t) are the original source signals, as they reach microphones 10 and 12, respectively, 
and Tj and T2 are unknown relative time delays, each of which may be positive or negative. 

Practitioners experienced in the art will recognize that bounded "negative** time delays can be 
achieved by adding a net time delay to the entire system. 
35 The relative time delays tj and T2 are used to form outputs y|(t) and y^CiX which correspond to 

signals 40 and 42: 

yjCO = X|{t) - X2(l-T2) = Wj(t) - W,(t-(Tj+T2)) 

y2(t) = X2(t) - Xj(t-T,) = W2(t) - W2(t-(T,+T2)) 
As depicted in Figure 2, these operations are accomplished by time-delay units 32 and 34, and 
40 combiners 36 and 38. 

Under anechoic conditions, these outputs 40 and 42, would be fully separated; i.e., each output 40 
or 42 would contain contributions from one source alone. However under echoic conditions these outputs 
40 and 42 are not fully separated. 

Under echoic and reverberant conditions, the microphone signals X|(t) and X2(t), which correspond 
45 to input signals received by the microphones 10 and 12, respectively, may be modeled as 



30 
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x,(t) = w,(t)+W2(t-T2)+k, , •(trw^(t)+kj2Xt)*W2(t) 
X2(t) = W2(t)+Wj(t-T,)+k2f (trw^(t)+k22Xtrw2(t). 
where the symbol denotes the operation of convolution, and the impulse responses kj |*(t), k|2*(t). 
k2i*(t)» and k2oX0 incorporate the effects of echoes and reverberation. 
5 Specifically, kj j '(0*w,(t) represents the echoes and reverberations of source I (wj(t)) as received 

at input 1 (microphone 10), k^2'(^)*^2^^^ represents the echoes and reverberations of source 2 (W2(t)) as 
received at input I (microphone 10), k2j'(t)*Wj(t) represents the echoes and reverberations of source 1 
(wj(t)) as received at input 2 (microphone 12), and k22*(t)*W2(t) represents the echoes and 
reverberations of source 2 (w-j(t)) as received at input 2 (microphone 12). 
]0 In consequence of the presence of echoes and reverberation, the outputs 40 and 42 from the direct- 

signal separator 30 are not fully separated, but instead take the form 
yi(t) = x,(t)-X2(i-T2) 

- w,(t)-Wj(i.{T,+T2))+k,,(t) * w,(t)+k|2(t) ♦ W2(t) 
y2(0 = X2(t)-Xj(i-T,) 
15 = W2(t)-W2(t-(Tj+T2»fk2i(t) * wj(t)+k22(0 * W2(t) 

where the filters kj ,(t), k|2(t). k2j(t), and k22(t) are related to k, ,'(t), kjj'd). kjjXt). and k22'(t) by 
time shifts and linear combinations. Specifically* 
ki,(t) = k,|'(t) - k2i^(t-T2). 
ki2(t) = kj2'(t) - k22Xt-X2)' 
20 k2,(i) = k2|Xt) • k|,*(t-T,), and 

k22(0 = k22*(t)-k,2Xt-T,). 
Note that yj(t) is contaminated by the term k^2(t) * W2(t), and that y2(t) is contaminated by the term 
k2i(t)* w,(t). 

Several possible forms of the crosstalk remover have been described as part of the background of 
25 this invention, under the heading of convolutive blind source separation. In the present embodiment, the 
crosstalk remover forms discrete time sampled outputs 60 and 62 thus: 

1000 

zi(n) - y,(n) - L h2(k)z2 ("-»^) 

30 „ 1000 

Mn) = y^C") - ^ h,(k)zj (n-k) 
k = 1 

where the discrete time filters h^ and h2 correspond to elements 52 and 54 in Figure 3 and are estimated 
adaptively. The filters hj and h2 are strictly causal, i.e., they operate only on past samples of Zj and 22- 
35 This structure was described independently by Jutten et al. (1992) and by Piatt and Faggin (1992). 

The adaptation rule used for the filter coefficients in the preferred embodiment is a variant of the 
LMS rule ("Adaptive Signal Processing," Bernard Widrow and Samuel D. Steams, Prentice-Hall, 
Englewood Cliffs, N.J., 1985, p 99). The filter coefficients are updated at every time-step n. after the 
new values of the outputs 2,(n) and Z2(n) have been calculated. Specifically, using these new values of 
40 the outputs, the filter coefficients are updated as follows : 

h,(k) [new] = hj(k) [old] + m 22Cn) z,(n-k) k= 1,2,... ,1000 
h2(k) [new] = h2(k) [old] + m 2,(n) 22(n-k) k= 1,2...., 1000 
where m is a constant that determines the rate of adaptation of the filter coefHcients, e.g. 0.15 if the input 
signals 10 and 12 were normalized to lie in the range -1 < x(l) +1. One skilled in the art will 
45 recognize that the filters hj and h2 can be implemented in a variety of ways, including FlRs and lattice 
IlRs. 
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As described, the direct-signal separator 30 and crosstalk remover 50 adaptively bring about full 
separation of two sound sources mixed in an echoic, reverberant acoustic environment. However, the 
output signals Z|(t) and z^i^) may be unsatisfactory in practical applications because they are colored 
versions of the original sources Wj(t) and w-^Ct). i.e., 
5 z, -CiWwjCt) 

where C|(0 aiJd ^2^^) represent the combined effects of the echoes and reverberations and of the various 

known signal transformations performed by the direct -signal separator 30 and crosstalk remover 50. 

As an optional cosmetic improvement for certain commercial applications, it may be desirable to 
10 append filters 70 and 72 to the network. The purpose of these filters is to undo the effects of filters ^^(t) 

and i^2^i). As those familiar with the an will realize, a large body of techniques for perfonning this 

inversion to varying and predictable degrees of accuracy currently exist. 

The embodiment of the signal processor 8 has been described in Figures 1-3 and 6-7 as being 

useful with two microphones 10 and 12 for separating two sound sources, A and B. Clearly, the 
15 invention is not so limited. The fonhcoming section describes how more than two microphones and 

sound sources can be accomodated. 



20 



General case with M microphones and sources 

The invention is able lo separate an arbitrary number M of simultaneous sources, as long as they 
are statistically independent, if there are at least M microphones. 

Let wj(t) be the j*th source signal and Xj(t) be the i*th microphone (mic) signal. Let t- be the 
time required for sound to propagate from source j to mic i, and let d(tjj) be the impulse response of a 
filter that delays a signal by t-j. Mathematically, d(tjj) is the unit impulse delayed by t|j, that is 

whe ''^V = 

re 

25 6(t) is the unit impulse function ("Circuits, Signals and Systems", by William McC. Siebert. The MIT 
Press. McGraw Hill Book Company, 1986, p. 319). 

In the absence of echoes and reverberation, the i'th mic signal XjCt) can be expressed as a sum of 
the appropriately delayed j source signals 

30 rix >•> 
repr 

esentation allows a compact representation of this equation for all M mic signals : 

whe ^^'^ ^ ^^'^ * ""^'^ 

re 

35 X(t) is an Af-element column vector whose i'th element is the i*th mic signal Xj(t), D(t>is an MxM 
element square matrix whose ij'th element (ie., the element in the i*th row and j*th column) is d(t-), 
and W(t) is an A/-element column vector whose j*th element is the j'lh source signal Wj(t). Specifically, 

Um(OJ Ld(tMl) d(tM2) ••• "(ImmIJ L^M«J 
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For each source Wj(t), if the delays ty for i=l^ M to the M mics are known (up to an 

arbitrary constant additive factor that can be different for each source), then M signals yj(t), 

j=U2 Af, that each contain energy from a single but different source Wj(t), can be 

constructed from the mic signals Xj(t) as follows : 

nt) = adjOit) * X(r) , 



where 



Yit) - 



10 



is the A/-element column vector whose j'th element is the separated signal yj(tX and adjD(t) 
is the adjugate matrix of the matrix D(t). The adjugate matrix of a square matrix is the 
matrix obtained by replacing each element of the original matrix by its cofactor, and then 
transposing the result ("Linear Systems", by Thomas Kailath, Prentice Hall, Inc., 1980, p. 
649). The product of the adjugate matrix and the original matrix is a diagonal matrix, with 
each element along the diagonal being equal to the determinant of the original matrix. Thus, 



= adjD{t)*D{t)*W{i) 
'\D{t)\ 0 0 0 
0 \IKt)\ 0 - 0 



0 0 
1^(01 *WCf) 



Wit) 



where lD(t)| is the determinant of D(t). Thus, 

yft) - \D{t)\*wjijt) for j = 



yj(t) is a "colored" or filtered version of Wj(t) because of the convolution by the filter 
impulse response |D(t);, If desired, this coloration can be undone by post filtering the outputs 

15 by a filter that is the inverse of |D(t)!. Under certain circumstances, determined by the 

highest frequency of interest in the source signals and the separation between the mics, the 
filter |D(t)| may have zeroes at certain frequencies; these make it impossible to exactly realize 
the inverse of the filter |D(t);. Under these circumstances any one of the numerous 
techniques available for approximating filter inverses (see, for example. "Digital Filters and 

20 Signal Processing", by Leiand B. Jackson, Kluwer Academic Publishers, 1986, p. 146) may be 

used to derive an approximate filter with which to do the post filtering. 
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The delays ty can be estimated from the statistical properties of the mic signals, up to 
a constant additive ^ctor that can be different for each source. AhemativeJy. if the position 
of each microphone and each source is known, then the delays ty can be calculated exactly. 
For any source that is distant, i.e., many times farther than the greatest separation between 
5 the mics, only the direction of the source is needed to calculate its delays to each mic. up to 

an arbitrary additive constant. 

The first stage of the processor 8. namely the direct signal separator 30, uses the 
estimated delays to construct the adjugate matrix adjD(t). which it applies to the microphone 
signals X(i) to obtain the outputs Y(l) of the first stage, given by: 
10 Y(l) = adjD(t) ♦ X(t). 

In the absence of echoes and reverberations, each output yj(t) contains energy from a single 
but different source Wj(t). 

When echoes and reverberation are present, each mic receives the direct signals from 
the sources as well as echoes and reverberations from each source. Thus 



15 where ejj(t) is the impulse response of the echo and reverberation path from the j'th source 

. to the i'th mic. All M of these equations can be represented in compact matrix notation by 

JC(0 = D{O*W(O+E(0*^(O 
where E(t) is the MxM matrix whose ijth element is the filler ey(t). 

If the mic signals are now convolved with the adjugate matrix of D(t), instead of 
20 obtaining separated signals we obtain partially separated signals: 

nt)=adJD{t)*X(t) 



Notice that each yj(l) contains a colored direct signal from a single source, as in the case 
with no echoes, and differently colored components from the echoes and reverberations of 
every source, including the direct one. 

The echoes and reverberations of the other sources are removed by the second stage 
25 of the network, namely the crosstalk remover 50, which generates each output as follows: 

zjit) = y .(0 - 53 yi) *z,{t) for J = 1,2 M 

*•> 

where the entities hj|,(t) are causal adaptive filters. (The term *'causal" means that hj|^(i)=0 
for t^O.) In matrix form these equations are wrinen as 

zeo = Y(tyH((rz(o 

where 2(1) is the ^/-element column vector whose j*th element is Zj(t), and H(t) is an MxM 
30 element matrix whose diagonal elements are zero and whose off diagonal elements are the 

causal, adaptive filters h:y^(t). 
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These filters are adapted according to a rule that is similar to the Least Mean Square 
update rule of adaptive filter theory ("Adaptive Signal Processing." Bernard Widrow and 
Samuel D. Steams, Prentice-Hall, Englewood Cliffs, NJ, 1 985, p. 99). 

This is most easily illustrated in the case of a discrete time system. 
5 lUustrative weight update methodology for use with a discrete time representation 

First, we replace the time parameter t by a discrete time index n. Second, we use the 
notation H(n)[new) to indicate the value of H(n) in effect just before computing new outputs 
at time n. At each time step n, the outputs Z(n) are computed according to 

Z(n) = Y(n)'H(n)[new]*Z(n) 
10 Note that the convolution on the right hand side involves only past values of Z, te Z(n- 

l),Z(n-2), Z(n-N), because the filters that are the elements of H are causal. (N is defmed 
to be the order of the filters in H). 

Now new values are computed for the coefficients of the filters that are the elements 
of H. These will be used at the next time step. iSpecifically, for each j and each k, with j ^ 
15 k, perform the following: 

h.^(u)[old\ ^ h^j(iu)[new\ ^ tt = l,2,...Jlf 



The easiest way to understand the operation of the second stage is to observe that the 
off-diagonal elements of H(t) have zero net change per unit time when the products like 
2j(t)2j^(t-u) are zero on average. Because the sources in W are taken to be statistically 
independent of each other, those products are zero on average when each output 2j(t) has 
20 become a colored version of a different source, say Wj{t). (The correspondence between 

sources and outputs might be permuted so that the numbering of the sources does not match 
the numbering of the outputs.) 

More specifically, let Z(i)=^4'(t)*W(t). From the preceding paragraph, equilibrium is 
achieved when 4^(t) is diagonal. In addition, it is required that: 

2(f) =y(/)-/f(f) * 7(0 *H^(r) 

= I Dit) I * W{t) ^adjOit) *E(t) * Wit) -Hit) * Y(r) * Wit) 
=( |D(r) \I*adJDit) *Eit) -Hit) ♦ Y(0) ♦ Wit) 

25 so that 

Y(0= |D(r) \I*adjDit) ♦£(?) -Hit) * Y(0 
'¥it)^ll^Hit)r'l\Dit)\J^adJDit)*Eit)] 

This relation determines the coloration produced by the two stages of the system, taken 
together. 

An optional third stage can use any one of numerous techniques available to reduce 
the amount of coloration on any individual output. 



30 



Example of general case with Af = 3. i.e. with 3 mics and 3 sources 



wo 97/11538 



PCTAJS96/14682 



-18- 

In the case where there are 3 mics and 3 sources, the general matrix equation 

X = D*W 

becomes 



d{t^,)d{t^,)d{t^^) 







* 









If the delays tjj are known, then the adjugate matrix of D{t) is given by 



adjD ■ 



rf(^3*'3l)-*2l*'33) ^(fll*'33>-^(^3*'3.) ^ihl^h^^^'^^hl^h^ 

dit^,*t^^)'dit^^t^,) <f(t,j-^f3,)-rf{t„Mj,) rf(r„*tj2)-rf(r2,*f„) 



Note that adding a constant delay to the delays associated with any column of D(t) 
leaves the adjugate matrix unchanged. This is why the delays from a source to the three mics 
need only be estimated up to an arbitrar>' additive constant. 

The output of the first stage, namely the direct signal separator 30, is formed by 
convolving the mic signals with the adjugate matrix. 









>2 


= adJD * 




^3 




-"3 



The network that accomplishes this is shown in Figure 4. 

In the absence of echoes and reverberations, the outputs of the first stage are the 
individual sources, each colored by the determinant of the delay matrix. 



y,(0 












= adJD*D{t) 






>vj(r) 


^3(0 











In the general case when echoes and reverberation are present, each output of the first 
stage also contains echoes and reverberations from each source. The second stage, namely the 
cross talk remover 50, consisting of a feedback network of adaptive filters, removes the 
effects of these unwanted echoes and reverberations to produce outputs that each contain 
energy only one different source, respectively. The matrix equation of the second stage 

z = r'H*z 

becomes 
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z,(t) 




y,(0 




0 . /.„(f) A„(0 




Zi(0 










Aj,(0 0 h^(t) 


* 












A„(r) AjjW 0 







where each h^ is a causal adaptive filter. The network chat accomplishes this is shown in 
Figure 5. 

It should be noted that the number of microphones and associated channels of signal 
processing need not be as large as the total number of sources present, as long as the number 
5 of sources emitting a significant amount of sound at any given instant in time does not 

exceed the number of microphones. For example, if during one interval of time only sources 
A and B emit sound, and in another interval of time only sources B and C emit sound, then 
during the first interval the output channels will correspond to A and B respectively, and 
during the second interval the output channels will correspond to B and C respectively. 
10 Referring to Figure 8 there is shown a block level schematic diagram of a direction of 

arrival estimator 200 of the present invention. The direction of arrival estimator 200 may be 
used with the direct signal separator 30 shown in Figure I to form an adaptive filter signal 
processor. 

As previously described, the function of the direct signal processor 30 is to receive 
15 digital acoustic signals 26 and 28 (for the two microphone embodiment shown in Figure 1), 

represented as Xj(t), and X2(t), respectively, where they arc: 
x,(t) = wj(t) + W2(t-T2) 
X2(t) = W2(t) + w,(t-t^), 
and process them to obtain the processed signals of yj<t) and y2(t). which correspond to 
20 signals 40 and 42: 

y^Ct) = x,(t) - X2(t-T2) = w,(t) - w,(t-(T|+T2)) 
y2(0 = X2(t) - x,(t-T,) « W2(t) - W2(t-(T,+T2» 

Under anechoic conditions, i.e., in the absence of echoes and reverberations of said 
acoustic waves from said plurality of sources, each of said first processed acoustic signals, 

25 y^(t) and y^Ct), represent acoustic waves from only one different source. To achieve such 

result, the variables and t^, must match the actual delay of the acoustic waves from the 
source reaching the microphones 10 and 12 as shown graphically in Figure 6. In the 
embodiment described heretofore, the variables T| and are supplied from the DOA 
estimator 20. At best, however, this is only an estimation. The aim of the present invention 

30 is to perform this estimation accurately and rapidly. 

In Figure 8, the preferred embodiment of the Direction of Arrival estimator 200 of the 
present invention is shown as receiving two input sample signals 26 and 28. However, as is 
clearly described heretofore, the present invention is not so limited, and those skilled in the 
art will recognize that the present invention may be used to process any number of input 

35 signals. As the number of input signals rises, the task of estimating directions of arrival 

becomes more computationally intensive. Nonetheless, the number of input signals that the 
present invention can process is limited only by the available data and computational 
resources, not by a fundamental principle of operation. 

The purpose of direciion-of-arrival estimator 200 is to estimate a pair of delay values. 

40 namely and t^, that are consistent with the input signals X](t) and X2(t) in the manner 
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described above. For purpose of discussion, a pair of delay values is termed a set of 
parameter values. 

A clock 210 is present in the preferred embodiment and runs at a "fast" rate of, e.g. a 
multiple times 100 cycles per second. The fast clock rate is supplied to the block 202, which 
5 comprises the parameter generator 220, the parameter latch 230. the 

instantaneous-performance calculator 240, the instantaneous-performance latch 250, and the 
instantaneous-performance comparator 260. and the gate 270. 

The "fast" clock rate of multiple times 100 cycles per second is then divided by a 
divider 204, which yields a "slow" clock rate of, e.g. 100 cycles per second The "slow" 

10 clock rate is supplied to the components in block 206, each of which executes once per each 

"slow" clock cycle. The order of their execution is obvious from Figure 8, Although the 
"fast" and "slow" clock rates arc described here as running synchronously, one skilled in the 
art will recognize that their operation could be made asynchronous without departing from 
the spirit and scope of the invention. 

1 5 Parameter sets originate at the parameter generator 220. Each time a parameter set is 

created, it is supplied to the parameter latch 230 for temporary storage. The first N 
parameter sets that arrive at the parameter latch 230 are stored in the parameter population 
storage 230(I-N), which holds N parameter sets, where N is 80 to 100 in the preferred 
embodiment. Thereafter, a parameter set is transferred from the parameter latch 230 to the 

20 parameter population storage 230(1 -N) (and a parameter set previously in the parameter 

population storage 230(I-N) is discarded) only at times determined by the state of the gate 
270, which in turn is determined by conditions described below. 

Once per "slow" clock cycle, one parameter set is copied from the parameter 
population storage 230(I-N) to the best-parameter latch 300. Which parameter set is copied 

25 is determined by selector 290 using information from the cumulative-performance comparator 

310, whose operation is described below. The parameter set in the smoothed-parameter latch 
320 is updated via a leaky integrator 330 from the value in the best-parameter latch 300. In 
the preferred embodiment, the leaky integrator 330 performs the operation of : 
Tj(new) = (l-e)T|(old) +(e)Tj(best-parameter latch) 

30 where the constant c is set as follows: 

e = (l/2)^^^*'> 

where k is about 10. so that the leaky integrator has a half-life of approximately 10 clock 
cycles. At each "slow" clock cycle, the parameter set in the smoothed-parameter latch 320 is 
supplied to the direct-signal separator 30. 

35 It remains to describe (a) the conditions under which the gate 270 opens, (b) how a 

parameter set in the parameter population storage 230(I-N) is chosen to be discarded each 
lime the gate 270 opens, and (c) how the selector 290 chooses a parameter set to copy from 
the parameter population storage 230(1 -N), based on information from the 
cumulative-performance comparator 310. 

40 Item (a). Each time a parameter set, generated by the parameter generator 220, 

arrives at the parameter latch 230, it is supplied to the instantaneous-performance calculator 
240 for operation thereon. Also, at each "slow" clock cycle, the N parameter sets in the 
parameter population storage 230(1 -N) are supplied to the bank of N 
instantaneous-performance calculators 240(I-N) for operation thereon. The performance 

45 value computed by the instantaneous-performance calculator 240 is supplied to the 

instantaneous-performance latch 250. At the same time, the plurality of performance values 
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computed by the plurality of instantaneous performance calculators 240(1-N) are supplied to 
a plurality of instantaneous-performance population storage 340(1 -N), respectively. 

The values in the instantaneous-performance population storage 340(I-N) may change 
even during a "slow" clock cycle in which the contents of the parameter-population storage 
230(1 -N) do not change, since they depend on the input signals 26 and 28, Xj(t) and X2(tX 
which do change. 

Each of the instantaneous-performance calculators 240(1 -N) and 240, calculates an 
instantaneous-perfomiance value for the given parameter set in accordance with: 

instantaneous performance value = 
-protlogi aQ, E{ y,(i) • y2(t) }^ ]. 

The notation £{•} denotes a time-windowed estimate of the expected value of the 
enclosed expression, as computed by a time average. In the preferred embodiment, this time 
average is computed over 50 milliseconds. The notation protlog[aQ,a] denotes a protected 
logarithm: 
protlog[ ap, a ] = log[a] if a > ag 
= log{aQ] otherwise . 

As will be seen, the value of y|(t) * y2(t) is related to the input signals 26 and 28. 
and also depends on the values of t] and T2 in a parameter set. Parameter sets with higher 
instantaneous performance values are deemed more consistent with recent input data X|(t) 
and x->(t) than are parameter sets with lower instantaneous performance values. The strategy 
of the remainder of the invention is to identify those parameter sets in the parameter 
population storage 230(1 -N) that exhibit the highest instantaneous performance values 
consistently over time. 

The instantaneous-performance calculator 240 and the N instantaneous-perfonnance 
calculators 240(1 -N) employ values supplied by the instantaneous-performance precalculator 
350; which performs operations on the input data 26 and 28 that would otherwise constitute 
effort duplicated across the N+1 instantaneous-performance calculators 240 and 240(1 -N). In 
particular, the quantity E{y](t)*y2(t)} can be computed without explicitly computing y|(t) 
and y'>(t) (see theory section, below). 

Each time a value arrives at the instantaneous-performance latch 250, it is compared 
by the instantaneous-performance comparator 260 with the least value in the 
instantaneous-perfomiance population storage 240(1 -N). If it is less, the gate 270 remains 
closed. If it is greater, the gate 270 opens. 

Item (b). At each "slow" clock cycle, each value in the instantaneous-performance 
population storage 340( 1 -N) is added to the corresponding value in the 
cumulative-performance population storage 360(1 -N) by one of the adders in the bank of 
adders 370(1 -N). Thus, each cell in the cumulative-performance population storage 360(I-N) 
maintains a cumulative sum of instantaneous-performance values. Other embodiments could 
employ a weighted sum or other arithmetic operation for incorporating new 
instantaneous-performance values into the cumulative performance. 

Each time the gate 270 opens, the parameter set currently in the parameter latch 230 
is copied to the parameter population storage 230(l-N), replacing the parameter set with least 
cumulative performance. Its cumulative performance is initialized to a value that falls short 
of the greatest value in the cumulative-performance population storage by a fixed amount 
determined in advance. This fixed value is chosen so that the cumulative performance of the 
parameter set newly introduced into the parameter-population storage 230(K) cannot exceed 
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ihe currently greatest cumulative performance in the cumulative-performance population 
storage 360(1 -N) until sufficient lime has elapsed for the statistics to be reasonably reliable. 
In the preferred embodiment, sufficient time is deemed to be 100 milliseconds. 

Item (c). At each "slow" clock cycle, the cumulative-performance comparator 310 
5 identifies the greatest value in the cumulative-perfonnance population storage 360(1 -N) and 

supplies the index (1 through N) of that value to the selector 290. The selector 290 copies 
the corresponding parameter set from the parameter population storage 230(I-N) to the 
best-parameter latch 300 for further processing as described above. 

The generator 220 can generate Tj and 13 in a number of different ways. First the 

10 generator 220 can draw the delay values randomly from a predetermined distribution of T| 

and T2 values. Secondly, the values of Xj and 13 can be estimated from the incoming data 
streams using crude methods that exist in the prior an. Alternatively they can be variants of 
existing values of Xj and X2. The values of T| and ^2 can also be based upon prior models 
of how the sources of the acoustic waves are expected to move in physical space. Further, 

'5 the values of Xj and Xj can be determined by another adaptive filtering method, including 

well known conventional methods such as a stochastic-gradient method. Finally, if the 
number and required resolution of the filter coefficients are low enough that the initial sets of 
delay values contain all the possible delay values, then clearly the generator 220 can even be 
omitted. 



20 Theory 

Problems in adaptive filtering can generally be described as follows. One or more 
time-varying input signals Xj(t) are fed into a filter structure H with adjustable coefficients 
h|^, i.e., H='H(hj,h2,...). Output signals yj(t) emerge from the filter structure. For example, 
an adaptive filter with one input x(t) and one output y(i) in which H is a linear, causal FIR 
25 filler might take the form: 

y(t) = iTh,^ xd-kAt). 

In general, H might J^Tftuch more complicated filter structure with feedback, cascades. 
30 parallel elements, nonlinear operations, and so forth. 

The object is to adjust the unknowns h^. such that a perfomnance function C(i) is 
maximized consistently over time. The performance function is usually a statistic computed 
over recent values of the outputs yj(t); but in general, it may also depend on the input 
signals, the filter coefficients, or other quantities such as parameter settings supplied 
35 exogenous! y by a human user. 

For example, consider the 2x2 case of the preferred embodiment of the direct-signal 
separator 30. Each of the input signals, Xj(t) or x^(t), comes from one omnidirectional 
microphone. Each omnidirectional microphone receives energy from two source signals, 
S](t) and S2(t). The structure H is a feed-fonvard network, with time delays, that implements 
40 the following relations: 

yi(0 = X|(t) - X2(t - X2) . and 
y2(t) = X2(t)- x,(t - Xj). 
The adjustable coefficients are X| and x-,; i.e., H=H(xj.x->). Note that even when the signals 
are sampled, Xj and X2 need not be integral multiples of At. 
45 The goal is to recover versions of s^(t) and S2(t), possibly after some frequency 

coloration, from X|(t) and X2(t). i.e.. to make yj(t) be a filtered version of Sj(t) only, and 
y2(i) a filtered version of 52(1) only, or vice versa. 
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The pivotal assumption is that S](t) and ^{i) are statistically independent. For 
mathematical convenience, it is also assumed that both have (possibly different) probability 
distributions symmetric about zero. (For example, speech signals have probability 
distributions that are sufTiciently symmetric about zero, though not perfectly so.) These two 
5 assumptions imply that 

E[f(s,(t))g(S2(l))] = 0 

holds for every possible choice of the two odd functions f and g. (The notation E[*] denotes 
a time-windowed estimate of the expected value of the enclosed expression, as computed by 
a time average. In the preferred embodiment, this time average is computed over 50 
10 milliseconds. Typical weighting functions are rectangular or exponential, but many others 

are also used.) 

In the absence of noise, echoes, reverberation, and certain mathematical degeneracies, the 
desired goal is accomplished when H is set such that the relation of statistical independence 
holds for yj(t) and y2(t) as well, i.e., 

15 E[ny,(t))g(y2(t))l-0 

for every possible choice of the two odd functions f and g. 

To specify these conditions in a form appropriate for the present invention, it is necessary 
to defme some performance function C(t) that is largest when the conditions of statistical 
independence are closest to being satisfied. In practice, a number of choices must be made 

20 in the definition of C(t). First, although the number of equations that must be satisfied for 

true statistical independence is infinite, it is generally necessary to choose a small number of 
pairs of functions f and g for which the equations are to be satisfied. Second, the time 
window over which the expectation operator £[•] performs averaging must be chosen. Third, 
the averages used to compute the function C(t) can be defined in one of two ways: direct or 

25 retrospective. 

Each of these choices can affect the speed at which the system adapts to changes in the 
configuration of sources, and the computational demands that it places on the underlying 
hardware. With some combinations of these choices, including those made in the preferred 
embodiment, the present invention provides advantages not found with competing methods 

30 such as LMS or RLS. 

We now discuss the three choices, their effects on speed and efficiency, and the 
conditions under which the present invention provides unique advantages. 

The first choice concerns which pairs of f and g functions to use in enforcing the 
statistical independence conditions. One familiar with the art will recognize that a variety of 

35 ways to enforce the statistical-independence relations exist, and that any one of these could 

be used without departing from the spirit and scope of the invention. In the preferred 
embodiment of the present invention, the relations are enforced by incorporating them into a 
least-squares error function of the following form: 



40 



e(t)= E cj^e,^(t) 



where C|^ is a weighting factor, 

e,^(t)- { E[ fk(y,(t))gk(y2(0)] } ^ , 

and fj, and gj^ are odd functions. Often they are defined to be odd polynomial terms, e.g.. 
*k(y) = y^- g**3* maximize e(t). The performance function C(t) is then defined 
45 thus: 

C(t) = - protlogi aQ. e(t) ) , 
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and the goal is then to maximize C(t). The notation protiog[ aQ,a ] denotes a protected 
logarithm: 

protlog[ a^, a ] = Iog[a) if a > a^ 
= log[aQ] otherwise. 

In the preferred embodiment, the bound Bq is equal to le-20. 

Given this defmitton of C(t), the first choice amounts to the selection of odd functions fj. 
and g|^ and weighting coefficients C|^. Each function fj, and g|. can be linear or nonlinear, 
and any number of pairs of f and g functions can be used. These choices have been debated 
extensively in the published literature because they involve a tradeoff between performance 
and mathematical sufficiency. On one hand, raw performance issues call for the number of 
pairs of functions f and g to be small* and for the functions themselves to be simple; 
othenvise C(t) becomes intensive to compute, difficult to optimize (because of multiple 
competing optima), and sensitive to noise. On the other hand, if those guidelines are carried 
too far, the performance function C(t) becomes "instantaneously underdetermined.** That is. 
at any given time t, the relation c(t) 0 does not uniquely determine H; the value of H that 
causes yj(l) and y2(t) to be filtered versions of S|(t) and s-jO) is only one of a family of 
values of H that satisfy e(t) = 0. 

The second choice pertains to the length of the window over which the expectation 
operator £[•] performs averaging. This choice gives rise to a tradeoff between adaptation 
rate and fluctuation. On one hand, if averaging times are too long, the system will adapt to 
changes slowly because the statistics that make up C(t) change slowly. On the other hand, if 
the averaging times are too short, C(t) will be noisy. Moreover, within short time windows 
it is possible for the pivotal assumption of statistical independence to be violated transiently. 

The third choice pertains to precisely how the averaging done by the expectation operator 
E["] is defined. This choice has an impact on how quickly and accurately the invention can 
adapt to the incoming signals. Consider the expression E[f|^(yj(t))g|^(y',(t))], which involves 
averaging over the outputs of the invention. The outputs depend, in turn, on the transfer 
function H, which varies in time as the invention adapts to the incoming signals. One 
method is to perform the time average using the actual past values of yj(t) and y^O). This is 
the conventional averaging procedure. Although simpler to implement, this method creates a 
problem: the averaging window may include past times during which the output signals y^Ct) 

y2(*) were computed with past values of H that were either much better or much worse 
than the current H at separating the sources. Consequently, with conventional averaging the 
performance function C(t) is not an accurate measure of how well the invention is currently 
separating the signals. 

The alternative is to perform the averaging retrospectively: compute the time average 
using the past values that y|(t) and y2(t) would have 

taken, had the current value of H been used to compute them. This way, the performance 
function C(t) is a more accurate measure of the performance of the current value of H, and 
adaptation to movement of the sources can occur more quickly. However, this method of 
computing the average calls for a somewhat more complicated technique for processing the 
input signals Xj(t) and X'>(t). The technique used in the preferred embodiment is described 
below. 

A key advantage of the present invention over existing adaptive filtering techniques is its 
relative imperviousness to the effects of mathematical insufficiency and fluctuation in C(t), 
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retative to existing methods for performing adaptive filtering. These properties permit the 
first two choices to be made in a manner that favors performance and adaptation rate. 

In particular, consider the behavior of the performance function C(t) in the preferred 
embodiment. Only one pair of functions f, g is used, and both functions are linear: f(y) = y; 
5 g{y) = y, so that C(t) is maximized by driving the expression 

E[y,(t) y2(t)] 

as close as possible to zero. This condition, known as decorrelation, is a much less stringent 
condition than statistical independence. It has long been recognized in the literature (see, for 
example, Cardoso, 1989) that decorrelation over a single time window, even a long one, is an 

10 insufficient condition for source separation. In the terms used here, the resulting 

performance function is instantaneously underdetermined. Moreover, in the preferred 
embodiment, the expectation operator £[•] averages over a very short time window: SO 
milliseconds; this causes the performance function C(t) to be sensitive to noise and transient 
depanures from statistical independence of the sources. 

15 It Is under conditions of instantaneous underdetermination and rapid fluctuation of C(t) 

that the greatest advantages of the present invention over prior art are realized. The so-called 
Least Squares methods, of which Recursive Least Squares (RLS) is the archetype, cannot be 
used because they effectively compute, at each time step, a unique global optimum of the 
performance function. (RLS performs this computation in an efficient manner by updating a 

20 past solution with data from the current time window; nonetheless, the resulting solution is 

the unique global optimum.) In the presence of instantaneous underdetermination, no unique 
global optimum exists. 

Under conditions of instantaneous underdetermination, LMS and GA solutions exist but 
are expected to fail. This is because both methods attempt to move toward the global 

25 optimum of the performance function at any given instant in time, using the gradient of the 

performance function. Under the conditions of instantaneous underdetermination presently 
under discussion, the theoretical global optimum is not a single point on the performance 
surface, but an infinite collection of points spread over a large region of the parameter space. 
At any given instant, LMS or GA techniques will attempt to move their parameter estimates 

30 toward one of these optimal points. Subsequent linear smoothing will produce a weighted 

mean of these estimates, which in genml may be very different from the ideal value. 
Instead of producing a weighted mean of estimates, the present invention effectively 
calculates a weighted mode: it chooses the single estimate that most consistently optimizes 
the performance surface over time. 

35 The present invention copes well with instantaneous underdetermination only under 

cenain conditions. In particular, the preferred embodiment of the present invention exploits 
the fact that if the sources Sj(t) and So(t> vary in their relative power, as is true in 
conversational speech, the relation e(t) = 0 holds true consistently over time only for the 
correct value of H. Other values of H can satisfy the relation only transiently. The behavior 

40 of the present invention is to choose t j and T2 such that C(t) is maximized consistently over 

time. 

The strategy of maximizing C(l) consistently over time is an advantage of the present 
invention over existing LMS and GA based algorithms even when instantaneous 
un determination is absent and a weighted mean does converge to the ideal parameters. Even 
45 under such favorable conditions, LMS and GA based algorithms may still produce a 

pzirameter set that fluctuates rapidly, albeit with values centered about the ideal. For certain 
applications, these fluctuations would detract from the usefulness or attractiveness of the 
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system. They may be removed by the conventional technique of linear smoothing, but 
usually at the expense of adaptation speed: the system will react slowly to true changes in 
the ideal parameters. In contrast, the present invention can produce non- fluctuating 
parameters while retaining the ability to adapt quickly. This is because noise fluauattons 
5 (and fluctuations in C(i) due to transient departures of the sources from statistical 

independence) are translated into fluctuations in the cumulative performance values, not in 
the parameters themselves. 

An important advantage of the present invention over the so-cal)ed Least-Squares 
methods, of which Recursive Least Squares (RLS) is the archetype, is its generality. 

10 Least-Squares methods can be used only when the form of C(i) permits its global maximum 

to be computed analytically. The present invention has no such requirement. 

In the present invention, the third choice (conventional vs. retrospective averaging) is 
especially critical. The invention relies on being able to obtain a rapid, accurate estimate of 
how well a given trial value of H would separate the signals, and therefore essentially 

15 requires that averaging be performed retrospectively. In addition, unlike conventional 

averaging, retrospective averaging can permit the bulk of the processing effort to be 
performed in a precomputaiion step that is common to all population members. 

In the preferred embodiment, this precomputaiion proceeds as follows. As previously 
discussed, the purpose of the performance calculator is to calculate C(t) in accordance with: 

20 C(l) = -protlog[ aQ, ] , 

where 

e(t)-E[y,(l)*y2(t)] . 
The expectation operator £[•] is expensive to calculate, but the computation can be structured 
so that multiple calculations of C(t) for the same time but different T| and x-y can share many 
25 operations. Since, as described above, the time delay Xj and are implemented by 

convolution with a sine-shaped filter Wj: 

y,(t) = x,(t) - Z W2(T2,k2) X2( I - k2 At ) 
k2 ** " 

y:>(t) = x^(t) - Z w,(T,,k,) x,( t - k, At ), 
30 - ^ k, * ' ' ' * 

and since the expectation operator E[*] is linear, it follows that the quantity in question can 
be expressed as follows: 

E{ yi(0y2(0) = 
E{ x,(i)x2(t)} 

35 - Z w,(T|,kj) e^2(*^l'^) 

*^1 

- Z W2(T2,k2) ep(0,k^) 
k2 ^ 



40 k, kj 



Z Z w,(T,,kj) W2(T2,k2) e,2(k|.k2) 



where Wj and W2 are coefficients that depend on the delays but not the input data, and 

e,2(kj,k2) = E{ X|( t « kj Al ) X2( t - ko At ) ). 
The quantity e|2(kj,k2) can be precalculated once per clock cycle for each (kpk^) pair, 
leaving the sums over kj and k2 to be calculated once for each tp 17 parameter set. 
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When the number of input channels exceeds two, the quantity 

ey(ki,kj) = E{ xj( t - kj At ) xj( t > kj At ) } 
must be preca]culated for every (i j) pair, and for each such pair, for every (k|.kj) pair. 

In the preferred embodiment of the present invention, the invention has been 
implemented in software, as set forth in the microfiche appendix. The software code is 
written in the C-m- language for execution on a workstation from Silicon Graphics. 
However, as previously discussed, hardware implementation of the present invention is also 
contemplated to be within the scope of the present invention. Thus, for example, the direct 
signal separator 30 and the crosstalk remover SO can be a part of a digital signal processor, 
or can be a part of a general purpose computer. 
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What Is Claimed Is: 

1. A method of determining a set of a plurality of signal parameters for use in an 
adaptive filter signal processor to process a plurality of input signals to generate a plurality of 
processed signals, said method comprising: 

5 a) generating a fixed plurality of sets; 

b) storing said fixed plurality of sets; 

c) generating a plurality of cumulative performance values, with a cumulative 
performance value corresponding to each of said fixed plurality of sets; 

d) comparing said plurality of cumulative performance values, 

]0 e) choosing one of said plurality of cumulative performance values* based upon 

said comparison; 

0 processing said plurality of input signals for a duration of time using one set, 
corresponding to said chosen one cumulative performance value, from said stored fixed 
plurality of sets to generate the plurality of processed signals; 
15 g) generating a new set of a plurality of signal parameters; 

h) generating a plurality of cumulative performance values for said new set and 
for each of said stored fixed plurality of sets; 

i) comparing said plurality of cumulative performance values generated in step 

(h), 

20 j) choosing one of said plurality of cumulative performance values generated in 

step (h), based upon the comparing step of (i); 

k) based upon the comparing step of (i), either: 

i) replacing one of said stored fixed plurality of sets, by said new set; or 

ii) deleting said new set; 

25 I) processing said plurality of input signals for a duration of time using one set, 

corresponding to said chosen one cumulative performance value, from said 
stored fixed plurality of sets to generate the plurality of processed signals; and 
m) periodically reverting to steps (gHO- 

2. The method of claim 1, wherein said generating step (h) comprising 

30 arithmetically combining random values from a pseudorandom number generator, one set in 

said plurality of sets, and recent values in said plurality of input signals. 

3- The method of claim 1 wherein: 

said comparing step (i) compares the cumulative performance value of said new set 
with the cumulative performance values of one of said stored fixed plurality of sets having a 
35 least performance value; 

and wherein said replacing step (k)(i) replaces one of said stored fixed plurality of 
sets having said least performance value. 

4. The method of claim 1 wherein: 
said generating step (h) further comprises generating an initial cumulative performance value 
40 for said new set by subtracting a fixed value from a cumulative performance value of one of 

said stored fixed plurality of sets having the greatest cuniulative performance value; and 
wherein said choosing step (j) chooses one of said plurality of cumulative performance values 
generated in step (h) having the largest cumulative performance value. 
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5. The method of claims M wherein said generating steps of (c) and (h) further 
comprises: 

generating a plurality of instantaneous performance values, each at a different time, 
for each set; and 

S combining said plurality of instantaneous perfonnance values for each set to generate 

said plurality of cumulative performance values; 

and wherein said plurality of input signals is characterized as Xj(t), X2(t),..., x^{t). said 
plurality of processed signals is characterized as y|(t), y^O), y^(t), and each of said 
choosing steps (e) and (j) selects a set having associated instantaneous performance values 
10 that are highest for the sets thai would generate processed signals. yj(t) and yj(t), that are 

most statistically independent for different i and j. 

6. The method of claim 5 wherein said plurality of input signals and said plurality 
of processed signals are assumed to have zero mean and to fluctuate in power over a few 
clock cycles, and said choosing steps (e) and (j) selects a set for which the associated 

15 instantaneous performance values are highest for the sets that would generate processed 

signals yj(t) and yj(i) that most closely satisfy the relation E[yj(l)yj(t)] ^ E[yj(t)]E{yj(t)] for 
any different indices i and j, where the operation £[...] is a weighted average over a period of 
time of said value of x,(t), X2(t) x„(t), y,(t),y2(t) y„(t). 

7. The method of claim 6 wherein said choosing step of (e) and (h) computes an 
20 instantaneous performance value in accordance with: 

C(t) - .Iog( Zjj {E[yi(t)yj(t)l)2 ), and 

wherein the logarithm computation is protected against numerical overflow. 

8. The method of claim 7 further comprising the step of: 

generating said plurality of input signals by a plurality of transducer means, 
25 based upon waves received by said plurality of transducer means from a plurality of 

sources. 

9. The method of claim 8 wherein said plurality of input signals is generated by a 
plurality of transducer means, based upon acoustic waves received by said plurality of 
transducer means from a plurality of sources. 

30 10. The method of claim 8 wherein said plurality of input signals is generated by a 

plurality of transducer means, based upon electromagnetic waves received by said plurality of 
transducer means from a plurality of sources. 

1 1 . Tht method of claim 8 wherein said transducer means are directional and 
positioned to minimize relative propagation delays. 

35 12. The method of claim 1 1 wherein each set of said plurality of signal parameters 

represents values of relative gains in transduction of said waves from said waves by said 
plurality of transducer means. 

13. The method of claim 12 wherein said choosing steps (e) and (j) computes said 
instantaneous performance value in accordance with: 
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Y(i) = inv G * X(tX where X and Y are the vector representations of the signals Xj(t) and 
yj(t), G is the matrix representation of said relative gains, and inv G is its inverse matrix. 

14. The method of claim 13 wherein said choosing steps (e) and 0) comprises 
computing Y(t) explicitly from the currently available input signals X(t); and 

5 calculating said instantaneous performance value by operating on the values Y(t) for 

at least a duration of time equal to the averaging time of the expectation operator E[]. 

15. The method of claim 13 further comprising the step of storing said plurality of 
input signals X(t) for at least a duration of time equal to the averaging lime of the 
expectation operator £[]; and 

10 wherein said choosing steps (e) and 0) computes Y(t) explicitly from said storage of 

input signals X(t), and subsequently said instantaneous performance value. 

16. The method of claim 13 wherein said choosing steps (e) and (j) comprises the 
steps of: 

precalculating. once per clock cycle, the quantity 
15 Xj(t)Xj(l) } 

for every (ij) pair; and 

computing said instantaneous performance value C(i) as required, from the 

quantities ey. 

17. The method of claim 8 wherein 

20 said plurality of signal parameters represent the coefficients of filters that model the transfer 

function of propagation of said waves from said plurality of sources to said plurality of 

transducer means; and 

said choosing steps (e) and (j) computes said instantaneous performance value in accordance 
with 

25 Y(t) = inv H ♦ X(t), where X and Y are the vector representations of the signals Xj(t) and 

yj(l), H is the matrix representation of said filters, and inv H is its inverse matrix. 

18- The method of claim 17 wherein said choosing steps (e) and (j) comprises the 
steps of: 

computing Y{t) explicitly from the currently available input signals X(t); and 
30 calculating said instantaneous performance value by operating on the values 

Y(t) for at least a duration of time equal to the averaging time of the expectation operator 
E[] plus the longest duration of said filters. 

19. The method of claim 17 further comprising the step of storing said plurality of 
input signals X(t) for at least a duration of time equal to the averaging time of the 
35 expectation operator E[] plus the longest duration of said filters; and 

wherein said choosing steps (e) and 0) computes Y(t) explicitly from said storage of input 
signals X(t), and subsequently said instantaneous performance value. 



20. The method of claim 17 wherein said choosing steps (e) and 0) comprises the 
steps of: 
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precalculating, once per clock cycle, the quantity 
Cij(kj.kj)=E<Xi(t-kjAt)xj(i-kjAt)} 

for every (i j) pair, and for each such pair, for every (kj,kj) pair; and 

computing C(t) from the quantities ejj(kj,kj) every time the instantaneous 
5 performance value is required. 



10 



15 



2 1 . The method of claim 8 wherein said transducer means are spaced ^art and 
omnidirectional: and wherein said plurality of signal parameters represent values of relative 
delays in propagation of said waves from said plurality of sources to said plurality of 
transducer means; and wherein said choosing steps (e) and (j) computes said instantaneous 
performance value using the definition Y(t) = adj D* X(t). where X and Y are the vector 
representations of the signals X|(t) and y.(t), D is the matrix representation of said relative 
delays, and adJ D is its adjugate matrix. 

22. The method of claim 21 wherein said choosing steps (e) and (j) comprises the 
steps of: computing Y{i) explicitly from the currently available input signals X(t); and 
calculating said instantaneous performance value by operating on the values Y(t) for at least a 
duration of time equal to the averaging time of the expectation operator Ef] plus the 
maximum value of said relative delays. 

23. The method of claim 21 further comprising the step of storing said plurality of 
input signals X(t) for at least a duration of time equal to the averaging time of the 

2^ expectation operator E(] plus the maximum value of said relative delays; and wherein said 

choosing steps of (e) and (j) computes Y(i) explicitly from said storage of input signals X(t). 
and subsequently said instantaneous performance value. 

24. The method of claim 22 wherein said choosing steps (e) and (j) comprises the 
steps of: 

25 precalculating, once per clock cycle, the quantity e|j(k|,kj)=E{Xj(t-kjAt)Xj(t-kjAt)) 

for every (ij) pair, and for each such pair, for every (kj,kj) pair; and 

computing C(t), as is required, from the quantities e-j(kj,kj) by implementing 
the necessary time delays using linear-phase, non-causal FIR fillers with a truncated 
sinc-shaped impulse response. 

25. A method of processing waves from a plurality of sources, comprising: 
receiving said waves, including echoes and reverberations thereof, by a plurality 

of transducer means; 

converting said waves, including echoes and reverberations thereof from said 
plurality of sources, by each of said plurality of transducer means into a signal, 
35 thereby generating a plurality of signals; 

calculating a set of a plurality of signal par<imeters by: 

generating a fixed plurality of sets of a plurality of signal parameters; 
storing said fixed plurality of sets; 

generating a plurality of instantaneous performance values for each set, 
^0 with each instantaneous performance value generated at a different time; 
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combining said plurality of instantaneous performance values for each set 
to produce a plurality of cumulative performance values, with a cumulative 
performance value produced for each set; 

storing said plurality of cumulative performance values; 
S periodically generating a new set of a plurality of signal parameters; 

generating a new cumulative performance value, based upon said new 

set; 

comparing said new cumulative performance value to said plurality of 
stored cumulative performance values; 
10 based upon said comparing step, either: 

i) replacing one of said stored plurality of cumulative performance 
values, by said new cumulative performance value, and the corresponding 
stored set by said new set; or 

ii) deleting said new cumulative value and said new set; 

1 5 comparing said stored plurality of cumulative performance values; 

choosing one of said stored plurality of cumulative performance values, 
based upon said comparison; 

choosing one set, corresponding to said chosen one cumulative 
performance value; 

20 supplying said chosen one set to a first processing means for operation 

thereon; 

first processing said plurality of signals, using said chosen one set, 
corresponding to said chosen one cumulative performance value, to generate a 
plurality of first processed signals, wherein each of said first processed signals 
25 represents waves from one source, and a reduced amount of waves from other 

sources; and then 

secondly processing said plurality of first processed signals to generate a 
plurality of second processed signals, wherein in the presence of echoes and 
reverberations of said waves from said plurality of sources, each of said second 
30 processed signals represents waves from only one different source. 

26. The method of claim 25 wherein said transducer means are spaced apart 
omnidirectional microphones, and said chosen one set of plurality of signal parameters has a 
set of relative delay parameters associated therewith; said first processing step further 
comprises: 

35 delaying said plurality of signals using said set of relative delay parameters and 

generating a plurality of delayed signals in response thereto; and 

combining each one of said plurality of signals with at least one of said 
plurality of delayed signals to produce one of said first processed signals. 

27. The method of claim 26 further comprising the step of: 

40 filtering each of said second processed signals to generate a plurality of third 

processed signals. 



28- The method of claim 27 further comprising the step of: 
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sampling and converting each one of said plurality of signals and for supplying 
same to said plurality of delay means and to said plurality of combining means, as 
said signal. 

29. The method of claim 25 wherein said second processing step further 
comprising: 

subtracting by a plurality of combining means one of said first processed 
signals received at said first input, and the sum of input signals received at a second 
input, to produce an output signal, said output signal being one of said plurality of 
second processed signals; 

generating a plurality of adaptive signals, with each of said adaptive signals 
being the output signal of one of said plurality of combining means; and 

supplying each of said plurality of adaptive signals to second input of said 
plurality of combining means other than the associated one combining means. 

30. The method of claim 25, wherein said step of generating a new set comprises 
arithmetically combining random values from a pseudorandom number generator, one set in 
said plurality of sets, and recent values of said plurality of input signals. 

31. The method of claim 30 wherein: 

said comparing step compares the cumulative performance value of said new set with 
the cumulative performance value of one of said stored plurality of sets having a least 
cumulative performance value; 

and wherein said replacing step replaces one of said stored plurality of sets having 
said least cumulative performance value. 

32- The method of claim 3 1 wherein: 
said replacing step further comprises: 

generating an initial cumulative performance value for said new set by subtracting a 
fixed value from a cumulative performance value of one of said stored plurality of sets 
having the greatest cumulative performance value; and 

wherein said comparing step chooses one of said stored plurality of sets having the greatest 
cumulative performance value, for processing by said processing step. 

33. An adaptive filler for determining a set of a plurality of signal parameters for 
use in an adaptive filter signal processor to process a plurality of input signals to generate a 
plurality of processed signals, said filter comprising: 

means for generating a fixed plurality of sets; 

means for storing said fixed plurality of sets; 

means for generating a plurality of cumulative performance values, based upon 
said fixed plurality of sets, with a cumulative performance value generated for each 
set; 

means for evaluating said plurality of cumulative performance values, and 
choosing one of said plurality of cumulative performance values, based upon said 
evaluation: and 
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means for processing said plurality of input signals for a duration of time using 
one set, corresponding to said chosen one cumulative performance value, from said 
fixed plurality of stored sets to generate the plurality of processed signals. 

34. The filter of claim 33» further comprises: 

S means for periodically generating a new set of a plurality of signal parameters; 

means for generating a new cumulative performance value, for each set of signal 
parameters, including said new set generated; 

means for comparing said new cumulative performance value corresponding to said 
new set generated to said cumulative performance values for each set of signal parameters; 
10 and 

means for either i) replacing one of said fixed number of plurality of said sets, by 
said new set; or ii) deleting said new set, in response to said comparing means. 

35. The filter of claim 34 further comprises: 

means for generating a plurality of instantaneous performance values for each set, 
15 with each instantaneous performance value generated at a different time;and 

means for combining said plurality of instantaneous performance values for each set 
to produce a plurality of cumulative performance values, with a cumulative performance 
value produced for each set. 

36. * The filter of claim 35, wherein said means for generating a new cumulative 
20 performance value comprises means for arithmetically combining random values from a 

pseudorandom number generator, one set in said plurality of sets, and recent values in said 
plurality of input signals. 

37. The filter of claim 36 wherein: 

said evaluating means compares instantaneous performance value of said new set with 
25 instantaneous performance value of one of said stored plurality of sets having a least 

instantaneous performance value; 

and wherein said replacing means replaces one of said stored plurality of sets having 
said least instantaneous performance value. 

38. The filter of claim 37 wherein: 

30 said means for generating a new cumulative performance value further comprises means for 

generating an initial cumulative performance value for said new set and means for subtracting 
a fixed value from a cumulative performance value of one of said stored plurality of sets 
having the greatest cumulative performance value; and 

wherein said evaluating means comprises means for choosing one of said stored plurality of 
35 sets having the greatest cumulative performance value, for processing by said processing 

means. 

39. A signal processing system for processing waves from a plurality of sources, 
said system comprising: 

a plurality of transducer means for receiving waves from said plurality of 
40 sources, including echoes and reverberations thereof and for generating a plurality of 

signals in response thereto, wherein each of said plurality of transducer means 
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receives waves from said plurality of sources including echoes and reverberations 
thereof, and for generating one of said plurality of signals: 

means for calculating a set of a plurality of signal parameters, said calculating 
means comprising: 

means for generating a fixed plurality of sets of signal parameters; 
means for storing said fixed plurality of sets of signal parameters; 
means for generating a plurality of cumulative performance values, based 
upon said fixed plurality of sets of signal parameters, with a cumulative 
performance value generated for each set of signal parameters; 

means for evaluating said plurality of cumulative performance values, 
and choosing one of said plurality of cumulative performance values, based 
upon said evaluation, and one of said sets of signal parameters corresponding to 
said one of said plurality of cumulative performance values chosen; 
first processing means for receiving said plurality of signals, and said plurality 
of signal parameters of said set chosen for generating a plurality of first processed 
signals in response thereto, wherein each of said first processed signals represents 
waves from one source, and a reduced amount of waves from other sources; and 

second processing means for receiving said plurality of first processed signals 
and for generating a plurality of second processed signals in response thereto, wherein 
20 each of said second processed signals represents waves from only one source. 

40. The system of claim 39, further comprising: 

means for generating a direction of arrival signal for said waves; 
wherein said first processing means for generating said plurality of first 
processed signals, in response to said direction of arrival signal. 

41. The system of claim 39, wherein the number of transducer means is two, and 
the number of sources is two. 

42. The system of claim 39, wherein said transducer means are spaced apart 
omnidirectional microphones and wherein said chosen one set of plurality of signal 
parameters has a set of relative delay parameters, and said first processing means comprises: 

^0 a plurality of delay means, each for receiving, one of said plurality of signals 

and using said set of relative delay parameters for generating a plurality of delayed 
signals in response thereto;; and 

a plurality of combining means, each for receiving at least one delayed signal 
and one of said plurality of signals and for combining said received delayed signal 

3^ and said signal to produce one of said first processed signals. 

43. The system of claim 39 wherein said plurality of transducer means are co- 
located directional microphones and wherein said one of said set of signal parameters has a 
set of gain parameters associated therewith, and wherein first processing means comprises: 

a plurality of multiplying means, each for receiving different ones of said 
40 plurality of signals and said set of gain parameters and for generating a scaled signal 

in response thereto; and 
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a plurality of combining means, each for receiving at least one scaled signal 
and one of said plurality of signals and for combining said received scaled signal and 
said signal to produce one of said first processed signals. 

44. The system of claim 39, 40, 41, 42, and 43 wherein said second processing 

5 means comprises: 

a plurality of combining means, each combining means having a first input, at 
least one other input, and an output; each of said combining means for receiving one 
of said first processed signals at said first input, an input signal at said other input, 
and for generating an output signal, at said output; said output signal being one of 

10 said plurality of second processed signals and is a difference between said first 

processed signal received at said first input and the sum of said input signal received 
at said other input; 

a plurality of adaptive filter means for generating a plurality of adaptive signals, 
each of said adaptive filter means for receiving said output signal from one of said 
15 plurality of combining means and for generating an adaptive signal in response 

thereto; and 

means for supplying each of said plurality of adaptive signals to one of said 
other input of said plurality of combining means other than the associated one 
combining means. 

20 45. The system of claim 44 further comprising means for filtering each of said 

second processed signals to generate a plurality of third processed signals. 

46. The system of claim 45 wherein said second processed signals are characterized 
by having a low frequency component and a high frequency component, and wherein said 
filtering means boosts the low frequency component relative to the high frequency 

25 component of said second processed signals. 

47. A signal processing system for processing waves from a plurality of sources, 
said system comprising: 

a plurality of transducer means for receiving waves from said plurality of 
sources, including echoes and reverberations thereof and for generating a plurality of 
30 signals in response thereto, wherein each of said plurality of transducer means 

receives waves from said plurality of sources including echoes and reverberations 
thereof, and for generating one of said plurality of signals; 

an adaptive filter for generating a plurality of signal parameters, said filter 
comprising: 

35 means for generating a fixed plurality of sets of signal parameters; 

means for storing said fixed plurality of sets of signal parameters; 
means for generating a plurality of cumulative performance values, based 
upon said fixed plurality of sets of signal parameters, with a cumulative 
performance value generated for each set; 
40 means for evaluating said plurality of performance values, and choosing 

one of said plurality of performance values and its corresponding set of 
plurality of signal parameters, based upon said evaluation; 
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firsi processing means for receiving said plurality of signals and said plurality 
of signal parameters and for generating a plurality of first processed signals in 
response thereto, wherein in the absence of echoes and reverberations of said waves 
firom said plurality of sources, each of said first processed signals represents waves 
5 from only one different source; and 

second processing means for receiving said plurality of first processed signals 
and for generating a plurality of second processed signals in response thereto, wherein 
in the presence of echoes and reverberations of said waves from said plurality of 
sources, each of said second processed signals represents waves from only one source. 

10 48. The system of claim 47 wherein said waves are acoustic waves, and said 

transducer means are microphones. 

49. The system of claim 48 further comprising means for filtering each of said 
second processed signals to generate a plurality of third processed signals. 



50. The system of claim 49 wherein said second processed signals are characterized 
15 by having a low frequency component and a high frequency component and wherein said 

filtering means boosts the low frequency component relative to the high frequency 
component of said second processed signals. 

51. The system of claim 49 wherein said microphones are spaced apart 
omnidirectional microphones and wherein said corresponding set of signal parameters has a 

20 set of relative delay parameters associated therewith; and 

said first processing means comprises: 

a plurality of delay means, each for receiving one of said plurality of signals 
and said set of relative delay parameters and for generating a delayed signal in 
response thereto; and 

25 a plurality of combining means, each for receiving at least one delayed signal 

and one of said plurality of signals and for combining said received delayed signal 
and said signal to produce one of said first processed signals. 

52. The system of claim 48 wherein said microphones are co-located directional 
microphones wherein said corresponding set of signal parameters has a set of gain parameters 

30 associated therewith; and 

said first processing means comprises: 

a plurality of multiplying means* each for receiving different ones of said 
plurality of signals and said set of gain parameters and for generating a scaled signal 
in response thereto; and 

35 a plurality of combining means, each for receiving at least one scaled signal 

and one of said plurality of signals and for combining said received scaled signal and 
said signal to produce one of said first processed signals. 

53. The systems of claims 47, 48, 49, 50. 51 and 52, wherein said second 
processing means comprises: 

40 a plurality of combining means, each combining means having a first input, at 

least one other input, and an output; each of said combining means for receiving one 
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of said first processed signals at said first input, an input signal at said other input, 
and for generating an output signal, at said output; said output signal being one of 
said plurality of second processed signals and is a difference between said first 
processed signal received at said first input and the sum of said input signal received 
5 at said other input; 

a plurality of adaptive filter means for generating a plurality of ad^tive signals, 
each of said adaptive filter means for receiving said output signal from one of said 
plurality of combining means and for generating an adaptive signal in response 
thereto; and 

10 means for supplying each of said plurality of adaptive signals to one of said 

other input of said plurality of combining means other than the associated one 
combining means. 

54. The system of claim 53 wherein each of said adaptive filter means comprises a 
lapped delay line. 

15 55. An adaptive filter signal processing system for processing waves from a plurality 

of sources, said system comprising: 

a plurality of transducer means for receiving waves from said plurality of 
sources, including echoes and reverberation thereof and for generating a plurality of 
signals in response thereto, wherein each of said plurality of transducer means 
20 receives waves from said plurality of sources including echoes and reverberations 

thereof, and for generating one of said plurality of signals; 

an adaptive filter for generating a plurality of signal parameters, said filter 
comprising: 

means for generating a fixed plurality of sets of signal parameters; 
25 means for storing said fixed plurality of sets signal parameters; 

means for generating a plurality of performance values, based upon said 
fixed plurality of sets of signal parameters, with a performance value generated 
for each set; 

means for evaluating said plurality of performance values; and choosing 
30 one of said plurality of performance values, and its corresponding set of signal 

parameters, based upon said evaluation; 

first processing means comprises a beamformer for receiving said plurality of 
signals and said plurality of signal parameters, and for generating a plurality of first 
processed signals in response thereto, wherein each of said first processed signals 
35 represents waves from one source, and a reduced amount of waves from other 

sources; and 

second processing means for receiving said plurality of first processed signals 
and for generating a plurality of second processed signals in response thereto, wherein 
each of said second processed signals represent waves from only one source. 



40 



56. The system of claim 55, wherein said transducer means are spaced apart 
omnidirectional microphones and said corresponding set of signal parameters has a set of 
delay parameters associated therewith, and wherein said first processing means comprises: 
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a plurality of delay means, each for receiving one of said plurality of signals 
and said set of delay parameters, and for generating a delayed signal in response 
thereto; and 

a plurality of combining means, each for receiving at least one delayed signal 
5 and one of said plurality of signals and for combining said received delayed signal 

and said signal to produce one of said first processed signals. 

57. The system of claim 55 wherein said second processing means comprises: 

a plurality of combining means, each combining means having a first input, at 
least one other input, and an output; each of said combining means for receiving one 
10 of said first processed signals at said first input, an input signal at said other input. 

and for generating an output signal, at said output; said output signal being one of 
said plurality of second processed signals and is a difference between said first 
processed signal received at said first input and the sum of said input signal received 
at said other input; 

15 a plurality of adaptive filter means for generating a plurality of adaptive signals, 

each of said adaptive filter means for receiving said output signal from one of said 
plurality of combining means and for generating an adaptive signal in response 
thereto; and 

means for supplying each of said plurality of adaptive signals to one of said 
20 other input of said plurality of combining means other than the associated one 

combining means. 

58. The system of claims 55, 56, and 57, wherein said first processing means 
comprises analog circuits. 

59. The system of claims 55, 56, and 57, wherein said second processing means 
25 comprises analog circuits. 

60. The system of claims 55, 56, and 57, wherein said first processing means are a 
part of a digital signal processor. 

61. The system of claims 55, 56, and 57, wherein said second processing means are 
a part of a digital signal processor. 

30 62. The system of claims 55, 56, and 57, wherein said first processing means are a 

part of a general purpose computer. 

63. The system of claims 55, 56, and 57, wherein said second processing means are 
a part of a general purpose computer. 

64, The system of claims 55, 56, and 57, wherein said first processing means are 
35 reconflgurable gate array circuits. 



65. The system of claims 55, 56, and 57, wherein said second processing means are 
reconflgurable gate array circuits. 



wo 97/11538 



87 



PCTAJS9<»/14682 



66. The system of claim 55, ftirther comprising: 

a plurality of third processing means for receiving said plurality of second 
processed signals and for removing frequency coloration therefrom. 

67. The system of claim 56 further comprising: 

5 a plurality of sampling digital converting means, each for receiving a different 

one of said plurality of signals and for generating a digital signal; said digital signal 
supplied to said plurality of delay means and to said plurality of combining means, as 
said signal. 

68. The system of claims 55. 56, and 57 further comprises:. 

10 means for periodically generating a new set of a plurality of signal parameters; 

means for generating a new cumulative performance value, for each of said sets, 
including said new set; 

means for comparing said new cumulative performance values; and 

switch means for either 1) replacing one of said fixed number of plurality of said 

1 5 sets, by said new set; or ii) deleting said new set, in response to said comparing means. 

69- The system of claim 68 further comprises: 

means for generating a plurality of instantaneous performance values for each of said 
fixed plurality of sets, with each instantaneous performance value generated at a different 
time; 

20 means for combining said plurality of instantaneous performance values for each set 

to produce a plurality of cumulative performance values, with a cumulative performance 
value produced for each set. 

70. The system of claim 69, wherein said means for generating a new cumulative 
performance value for each of said sets, including said new set, comprises means for 

25 arithmetically combining random values from a pseudorandom number generator, one set in 

said plurality of sets, and recent values in said plurality of input signals. 

71. The system of claim 70 further comprises: 

means for generating a plurality of instantaneous performance values for each of said 
sets, including said new set, with each instantaneous performance value generated at a 
30 different lime; 

and wherein: 

said comparing means compares instantaneous performance value of said new set with 
instantaneous performance value of one of said stored plurality of sets having a least 
instantaneous performance value; 
35 and wherein said replacing means replaces one of said stored plurality of sets having 

said least instantaneous performance value. 

72. The system of claim 68 further comprises means for generating an initial 
cumulative performance value for said new set and means for subtracting a fixed value fi-om 
a cumulative performance value of one of said stored plurality of sets having the greatest 

40 cumulative perfonnance value; and 
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wherein said comparing means comprises means for choosing one of said stored plurality of 
sets having the greatest cumulative perfonnance value, for processing by said processing 
means. 
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